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IS.  ABSTRACT 


Recently,  speech  understanding  research  has  taken  a direction  which  recognizes 
the  importance  of  syntactic  and  semantic  constraints  as  an  essential  part  of  the 
process  which  deciphers  speech  signals  into  sequences  of  sounds  (see  Newell  et  al. 
1973).  Consequently,  it  has  become  important  for  speech  researchers  to  be  acquainted 
with  the  work  that  has  been  done  in  the  area  of  computational  linguistics,  attempting 
to  construct  computer  programs  to  model  the  process  of  natural  language  understanding. 
This  paper  attempts  to  provide  an  introduction  to  the  techniques  and  results  which 
have  come  out  of  work  in  computational  linguistics  which  have  special  relevance  to 
the  design  of  speech  understanding  systems.  The  paper  was  written  for  an  audience 
with  some  understanding  of  the  nature  of  speech  signals  and  the  difficulties  of  per- 
forming an  acoustic  and  phonetic  analysis  of  such  signals  but  with  little  familiarity 
with  the  techniques  for  parsing  and  semantic  interpretation  of  natural  language  or 
the  ways  in  which  such  techniques  could  be  used  in  a total  speech  understanding 
system.  However,  readers  with  interests  in  computational  linguistics,  linguistics, 
and  artificial  intelligence  may  also  find  the  paper  of  interest. 


This  paper  is  not  intended  to  be  a survey.  Rather,  in  it  I will  try  to  trace 
the  development  of  what  I think  are  several  important  ideas  and  trends  in  parsing 
ana  syntax  and  in  semantic  interpretation.  I will  attempt  to  convey  a feeling  for 
what  I think  the  state  of  the  art  is,  how  it  developed  conceptually,  and  some  of  the 
new  perspectives  that  the  problems  of  speech  understanding  place  on  the  processes  of 
parsing  and  semantic  interpretation. 
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introduction 


Recently,  speech  understanding  research  has  taken  a 
direction  which  recognizes  the  importance  of  syntactic  and 
semantic  constraints  as  an  essential  part  of  tne  process 
which  deciphers  speech  signals  into  sequences  ol  sounds  (see 
Newell  et  al  . 1973  ).  Consequently,  it  has  become  important 
lor  speech  researcners  to  be  acquainted  with  the  work  that 
nas  been  done  in  tne  area  ol'  computational  linguistics, 
attempting  to  construct  computer  programs  to  model  the 
process  oi  natural  language  understanding.  This  paper  will 
attempt  to  provide  an  introduction  to  the  techniques  and 
results  which  have  come  out  ol  work  in  computational 
linguistics  wnich  I think  have  special  relevance  to  the 
design  ol  speech  understanding  systems.  The  paper  was 
written  lor  an  audience  with  some  understanding  oi  the 
nature  oi  speech  signals  and  the  difficulties  oi  performing 
an  acoustic  and  phonetic  analysis  ol  sucn  signals  but  with 
little  familiarity  with  tne  tecnniques  lor  parsing  and 
semantic  interpretation  ol  natural  language  or  tne  ways  in 
which  sucn  techniques  could  be  useo  in  a total  speech 
understanding  system.  however,  readers  with  interests  in 
computational  linguistics,  linguistics,  ano  artificial 
intelligence  may  also  1 ind  tnings  ol  interest  herein  For 
tne  reader  with  little  or  no  background  in  the  nature  of 
speech  production  ano  tne  characteristics  ol  speech  signals, 
I suggest  the  papers  oy  uenes  anu  Pinson  (1963)  and  by 
Jakobson,  Pant  and  Halle  (1967)  as  appropriate 
introductions.  This  paper  snould  be  readable  nowever 
witnout  sucn  prior  knowledge  of  speecn  characteristics. 


This  paper  is  not  intended  to  be  a survey.  natner,  in 
it  1 will  try  to  trace  tne  development  of  wnat  1 think  are 
several  important  ideas  ano  trends  in  parsing  and  syntax  and 
in  semantic  interpretation.  I will  attempt  to  convey  a 
feeling  lor  wnat  I tninx  tne  state  of  tne  art  is,  how  it 
developed  conceptually,  ana  some  ol  the  new  perspectives 
that  tne  problems  of  speech  understanding  place  on  tne 
orocesses  of  parsing  ano  semantic  interpretation. 
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Part  1.  Syntactic  Analysis 

There  are  two  parts  to  the  problem  ol  syntactic 
analysis  --  one  is  a component  of  judgment  or  decision 
(wnetner  a given  string  ol  words  is  a sentence  or  not)  anu 
the  other  is  a component  ol  representation  or  interpretation 
(deciding  what  tne  pieces  ol'  tne  sentence  arc  anu  n^w  they 
relate  tu  eacn  otner)  . In  speecn  understanding  we  will  see 
tnn  both  ol  tnese  are  important. 

Let  me  start  with  a mini-nistory  describing  wnat  1 
thinK  the  current  state  ol  tne  art  is,  now  it  developed 
conceptually,  and  some  ol  tne  new  perspectives  that  the 
problems  oi  speecn  understanding  place  on  tne  evaluation  ol' 
parsing  tecnniques. 


Phrase  structure  Grammars 

Tne  lie  la  oi  linguistics  was  given  a great  stimulus 
when  the  two  aspects  cl  syntax  (judgmental  anu  structural) 
were  combined  in  tne  lormalisr.;  ol  pnrase  structure  grammar. 
Prior  to  tnis  development,  largely  due  to  Chomsky  (e.g. 
Chomsky,  1965),  the  mechanism  wnercby  a computer  program 
could  decide  whether  a given  sequence  of  worus  was  a 
grammatical  sentence  or  not  would  have  been  difficult  to 
imagine  . 

Tne  principal  component  oi  a phrase  st.ucture  grammar 
is  a collection  oi  "rewrite  rules"  suen  as  tr.e  following: 

a - > in'  V P 
ft  P - > DC  i in 
V t - > V ft  P 

Intuitively,  tne  first  rule  indicates  that  a sentence  can 
consist  ol  a noun  pnrase  followed  by  a verb  pnrase. 
formally,  it  indicates  tnat  in  tne  course  ol  deriving  a 
sentence,  one  can  replace  at:  occurrence  ol  tne  symbol  b in 
tne  string  derived  so  far,  with  tne  sequence  ol  two  symbols 
up  V r . birr,  iiariy,  one  can  replace  the  up  with  tne  sequence 
uc T u anu  the  V p witn  tne  sequence  V nr,  Ultimately  deriving 
tne  sequence  uei  w V i)li  ft,  wnicn  is  tne  sequence  oi 
syr, tactic  wore  categories  underlying  a sentence  suen  as 

ihe  man  bit  tne  a o g • . 


2 


bbN  heport  No  . 3067 


bolt  beranek  ana  Newman  Inc 


Parsers  ana  hecognizers 

'ihe  rewrite  rules  ol  a phrase  structure  grammar  can  be 
usea  to  characterize  the  set  ol  possible  sequences  of  words 
which  can  be  considereu  grammatical  sentences,  thereby 
lormally  representing  tne  judgmental  part  of  syntax.  A 
formal  algorithm  lor  taking  a grammar  and  deciding  whether  a 
sequence  of  woras  is  a sentence  with  respect  to  that  grammar 
is  callea  an  acceptor  or  a recogn.zer. 

If  in  tne  course  ol  aeriving  a sentence  accoraing  to 
the  rules  we  keep  track  01  whicn  symbols  were  rewritten  into 
whicn  sequences,  one  can  construct  a tree  structure  suen  as 
that  represented  in  figure  1 which  gives  a very  nice 
representation  ol  wnat  tne  parts  of  the  sentence  are  and  how 
tney  are  put  together,  thus  achieving  a structural 
representation  of  the  sentence.  An  algorithm  for 
constructing  such  a representation  wnile  accepting  or 
recognizing  a sentence  i3  callea  a parser. 


THE 


DOG 


figure  1:  A bam  pie  rnrase  structure  iree 


Lexical  categories  ana  aictionaries 

notice  that  in  ligure  1 ana  in  the  grammar  rules  there 
are  two  ailferent  kinas  of  names  of  noaes;  there  are 
nonterminal"  symbols  like  S,  NP,  ana  VP,  wnicn  name  wnole 
pnrase  typeo,  ana  tnere  are  otner  symbols  which  are 
essentially  lexical  wore  class  names,  like  determiner,  noun, 
ana  vero.  Inis  distinction  net ween  terminal  ana  nonterminal 
symoois  is  iormalizea  by  aiviaing  tne  vocabulary  of  special 
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sy  iools  ol  a phrase  structure  grammar  into  a terminal  aria 
nonterminal  vocabulary.  The  initial  symool  b,  ana  all  ol 
the  symbols  whicn  later  get  rewritten  oy  pnrase  structure 
rules  are  in  the  nonterminal  vocabulary.  The  uerivation  ol 
a sentence  stops  wnen  tie  a.; ring  consists  entirely  ol 
terminal  symbols.  in  a simple  view  ol'  pnrase  structure 
grammars,  the  terminal  symbol  i would  be  the  tnglisn  words 
themselves,  but  tnis  woula  result  in  a huge  set  ol 
"singleton"  rules  such  as: 

LiLT  ->  the 

Ion  tne  average  tnere  would  be  several  suen  rules  for  eacn 
word  in  mnglisn).  instead,  tne  syntactic  word  classes 
usually  serve  as  tne  terminal  vocabulary  and  tne 
correspondence  between  syntactic  word  classes  ana  tne  words 
tneraselves  is  taKen  care  01  oy  a dictionary. 


other  grammar  models 

All  of  the  above  presentation  a as  been  a description  of 
wnat  is  called  context  tree  pnrase  structure  grammars, 
inere  arc  in  fact  many  oiflercnt  types  oi  pnrase  structure 
grammars  depending  on  tne  types  ol  rules  permitted  ar,u  tne 
way  tnat  they  are  appiieu.  tor  eacn  uifierent  type  ol 
grammar,  tnere  is  a corresponding  class  ol  languages  wnicn 
can  oe  characterized  oy  grammars  ol  tnat  type;  the  grammar 
lormalism  is  said  to  generate  tnis  class  ol  languages, 
wnenever  two  lormalisras,  eitner  grammars  or  automata, 
generate  tne  same  class  01  languages,  tney  are  saia  to  be 
generativeiy  equivalent  or  equivalent  in  generative  power, 
anu  if  one  lormniism  generates  a superset  of  tne  class 
generated  oy  anotnei  lormalism,  tnen  tnat  modex  is  said  to 
be  stronger  in  generative  power.  inere  is  a weli-xnown 
nierarcny  ol  successively  more  powerful  pnrase  structure 
grammar  moueis,  Known  among  tormal  language  theorists  as  the 
biioasny  nierarcny.  i would  Hkc  to  introduce  tnese  nere 
because  1 want  to  come  duck  occasionally  ana  refer  to  tne 
various  tnings  wnicn  tne  aiiierent  moueis  can  oo. 

'ine  grammar  moueis  in  tne  bnomsKy  nierarcny  are  known 
as  type  0,  type  1,  type  d,  anu  type  3 grammars.  ine  context 
free  grammar  wnicn  we  nave  just  uescribeu  is  tne  type  d 
grammar  anu  is  Characterized  by  tne  lact  tnat  tne  xelt-nana 
sices  of  its  rewrite  rules  consist  of  a single  nonterminal 
symool  ana  tne  right-nana  siaes  may  be  any  nonempty  string 
oi  terminal  ana  nonterminal  symbols.  Tne  type  3 grammars, 
also  Known  as  limte  state  grammars,  are  more  restricted 
tnan  tne  context  Tree  grammars  anu  correspond  in  generative 
power  to  tinite  state  machines.  iney  are  characterized  by 
rewrite  rules  whose  left-nano  siaes  are  single  nonterminals 
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ana  wnose 
symbol  or 
nonterminal 


right-hand 
a terminal 


sides  are  eitner  a single  terminal 
symbol  followed  oy  a single 


At  the  otner  end  of  tne  spectrum  are  tne  type  0 
grammars,  also  known  as  general  rewriting  systems,  wnicn 
correspond  in  generative  power  to  luring  macnines.  General 
rewriting  systems  are  characterized  by  rewrite  rules  whose 
left-nand  ana  right-  nanu  sides  can  be  arbitrary  strings  of 
terminal  and  nonterminal  symbols  subject  only  to  tne 
constraint  tnat  terminal  symbols  cannot  be  rewritten  as  some 
diflerent  terminal  or  nonterminal  symbol.  iype  1 grammars, 
also  known  as  context  sensitive  grammars,  are  strictly  less 
powerful  than  general  rewriting  systems  anu  strictly  more 
powerful  than  context  free  grammars.  They  are  characterized 
by  rewrite  rules  in  wnicn  tne  iett-nanu  sides  specify  not 
only  a nontermina1  symbol  to  be  rewritten,  but  also  a 
context  of  terminal  and  nonterminal  symools  wnicn  must  be 
present  in  oraer  for  tne  rule  to  be  applied. 

rigure  ^ gives  a summary  of  tne  types  of  rules  for  eacn 
class  of  grammars  . 

In  the  figure,  the  notation  V is  usea  to  represent  the 
union  of  the  terminal  anu  nonterminal  vocabularies  of  tne 
grammar  (Vt  anu  Vn),  anu  the  * operator  is  useu  to  indicate 
tne  set  of  all  possible  strings  wnicn  can  be  r: ae  from  a 
given  vocabulary  (i.e.  Vt*  indicates  tne  ?et  of  all 
possible  terminal  strings).  The  symbol  e represents  the 
empty  string  (i.e.  tne  string  with  no  symbols'. 
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TYPE  0:  GENERAL  REWRITING  SYSTEM 


a 


a t /3  € V' 


TYPE  1 : CONTEXT  SENSITIVE 

aX/3 


y/a—p 

OR 

— • a y 13 


Xc  VN 

a,  /3,  Yc  V* 
X/e 


TYPE  2 : CONTEXT  FREE 

x **■  y X€VN,/€V*-{e} 

TYPE  3 : FINITE  STATE 

X — •>  a Y X,  Y c VN 


a € Vr 


f igure  Z:  nummary  ol  tne  Chomsky  nierarcny 
ol  I’nrase  structure  Grammars 


tacn  ol  tne  grammars  in  tne  tnoiaSKy  nierarcny 
represents  a restriction  in  generative  power  (witn  an 
attendant  ease  in  parsing  or  recognition)  over  tne  power  oi 
grammars  witri  a lower  number.  tacn  class  witn  a nigiier 
numoer  represents  a special  case  ol  tne  classes  witri  lower* 
numbers.  ine  principal  aiiierence  between  tne  context 
sensitive  grammar  arm  tne  general  rewriting  system  is  tnat 
tne  former  is  pronioiten  by  tne  nature  ot  its  rules  lrom 
erasing  anytning  irora  tne  working  string  as  it  proceeds 
t i.e . tne  right- n and  sides  ol  rules  are  always  at  least  as 
long  as  the  le  it  - ha  rid  sides).  for  tne  general  rewriting 
systems,  this  is  not  trie  case,  anu  arbitrary  amounts  ol 
intermediate  "3cratun  worx"  can  be  erased  out  ol  a 
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derivation  wit  h out  leaving  a trace  in  tu  resulting  string 
that  is  generated.  This  is  what  gives  the  general  rewriting 
system  its  power,  and  also  has  trie  undesirable  consequence 
tnat  a recognition  or  parsing  algorithm  cannot  oe  guaranteed 
to  exist  tor  general  rewriting  systems.  For  ail  of  the 
other  classes  of  grammars,  it  is  possible  to  construct  a 
recognizer  which  for  an  arbitrary  string  wil..  say  yes-or-no 
whetner  that  string  is  in  a given  grammar.  ueneral 
rewriting  systems  are  thereto  re  not  very  desirable  as 
machine  mcdels  ol  language  due  to  this  inability  to 
guarantee  a recognition  algorithm. 


Derivations 

for  each  of  the  type  1,  it,  ana  3 grammars,  formal 
parsing  algorithms  can  be  devisee  wnicn,  given  a grammar  and 
a string,  can  answer  the  question  whether  tne  string  is  a 
sentence  witn  respect  to  tne  am  mar.  'Inis  is  aone  oy 
attempting  to  discover  a derivati-.i  oi  tne  string  from  tne 
initial  symool  of  tne  grammar  by  means  ot  tne  rewrite  rules. 
A derivation  is  essentially  a sequence  of  working  strings 
starting  with  ttie  initial  symbol,  each  ot  wnich  results  from 
the  preceding  one  oy  one  application  of  a rewrite  rule.  h 
string  is  said  to  be  generated  by  tne  grammar  it  there  is  a 
derivation  of  tne  grammar  leading  to  it. 

Figure  3 gives  a sample  derivation  ot  the  sentence  in 
Figure  1 . 
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SUMMARY  OF  DERIVATION 
S -2*  DET  N V OET  N 


INTERMEDIATE  STRINGS 
S 


NP 

VP 

DET 

N 

VP 

DET 

N 

V 

NP 

DET 

N 

V 

DET 

r'igure  3:  a oampie  uerivation 


notice  liowever  tnat  tnere  can  ue  several  aistinct 
uerivations  for  a single  pnrase  structure  tree  corresponaing 
to  different  oraers  of  applying  tne  rewrite  rules.  r'or 
example,  it  insteaa  oi  expanding  tne  subject  noun  pnrase 
oetore  the  verb  pr  ase  one  were  to  expanu  tne  verD  pnrase 
first,  one  of  tne  uerivations  ol  figure  4 uoula  result, 
tfigure  4 compactly  represents  ail  oi  tne  possible 
derivations  of  tnis  particular  surface  string,  with  the 
common  initial  parts  oi  different  derivations  comoineu  . 
Alternative  c no  ices  lor  expanding  a given  string  are 
indicated  by  tne  arrows,  ana  individual  derivations  are 
terminated  by  under! ining  . ) 


8 


r,  b h heport  Ho.  306? 


bolt  uerauek  and  Hewman  Inc  . 


S DET  N V DET  N 

S 

i — NP  VP 


DET 

N 

VP 

DET 

N 

V 

NP 

DET 

N 

V 

DET 

NP 

V 

NP  — 

DET 

N 

V 

NP 

DET 

N 

V 

DET 

NP 

V 

DET 

N 

DET 

N 

V 

DET 

rigure  4 : Alternative  derivations  ol  tne  Sentence 

l'rom  r’igure  3 


essentially  al  of  tne  expansion  tnat  appears  in  tne  phrase 
structure  tree  ould  oe  done  in  any  oraer  ana  eacn  different 
ordering  would  iv  i oiflerent  uerivation  wnicn  corresponds 
to  eflectively  e same  parse.  If  we  don't  want  to  be 
swamped  with  alternative  derivations  ol  the  same  parse,  tnen 
we  need  to  include  in  our  parsing  aigoritnm  some  control 
strategy  tnat  will  keep  it  i rom  getting  ail  ol  tnem.  ine 
typical  control  strategy  tnat  is  used  in  text-oased  parsers 
(as  opposed  to  speech)  is  to  decide  arbitrarily  that  the 
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onxy  derivations  wnicn  will  be  considered  will  be  those 
which  expand  at  each  step  the  leftmost  nonterminal  in  tne 
string.  Ihis  effectively  selects  one  canonical  derivation 
for  each  possible  parse  tree.  This  maxes  tne  derivation 
shown  in  figure  3 the  canonical  one,  and  tne  otner  two  tnat 
are  shown  in  figure  4 are  not  found. 


The  Hoots  of  Nondeterminism 

Tne  control  strategy  wnich  we  have  just  described  is 
very  simple  to  state  in  terms  of  a generative  rule,  but  if 
ons  wants  to  use  it  for  an  analysis  algorithm,  it  seems  to 
suggest  the  following  analysis  strategy;  as  you  start 
scanning  along  the  string,  as  soon  as  you  find  a piece  that 
matches  the  right-hana  side  of  some  rule,  then  you  can 
collapse  that  into  a single  constituent.  however,  this 
strategy  will  not  work  in  general,  as  we  can  illustrate  with 
tne  grammar  of  figure  5.  Tnis  figure  illustrates  a very 
simple  grammar  for  aritnmetical  expressions.  In  it,  an 
expression  (c.)  can  be  a term  (T)  plus  a term  or  can  be  just 
a single  term.  Likewise  a term  can  be  a fact'?  (f)  times  a 
factor  or  just  a single  factor,  and  factors  can  be  any  of 
the  symbols  a,  o,  or  c.  figure  6 snows  tne  structure  that 
we  would  line  to  gee  as  a parsing  of  the  string  "a  + b*c"  . 


E ► T + T 

E •*  T 

T •>  F * F 

T ► F 

F — •>  A,  B,  C 


figure  5:  A Simple  Grammar  for  Arithme  ic  Lxpressions 
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lne  way  that  we  nave  written  ttie  rules  ol'  tne  grammar  forces 
on  us  tne  priority  tnat  tne  product  comes  lirst  and  then  tne 
sum.  (k  sligntly  more  expanded  grammar  would  include 
parentneses  to  enable  one  to  express  tne  otner 
interpretation  if  that  uas  wnat  was  intended.)  now  suppose 
we  took  tnis  string  of  cnaracters  anu  tne  context  free  rules 
of  figure  5 and  started  doing  reductions  on  the  string 
wnerever  we  could.  we  could  reduce  the  a to  an  f ana  tnen 
to  a I,  tnen  we  d have  to  go  on  to  tne  + wnicn  can't  reduce 
Dy  itself  . «e  coula  reuuce  trie  t>  to  an  t ana  tnen  to  a T 
and  then  we  could  reduce  tne  i + i to  a single  c, . after 
that  we  would  reduce  tne  c to  an  r ana  tnen  to  a i and  alter 
that  we  would  De  stuck  because  tnere  is  no  rule  wnicn  will 
reduce  t * 1 to  anytning.  lne  structure  that  we  have  ouilt 
when  we  come  to  tne  impasse  is  shown  in  figure  7. 


E 


xl 

T 

1 

[\ 

T 

T 

1 

F 

1 

F 

1 

F 

1 

1 

A H 

1 

1-  B 

1 

* C 

figure  7:  k clocked  reverse  Derivation  from  "a+b^c" 
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tssentially,  in  order  to  obtain  the  parse  tree  in  Figure  6, 
it  is  necessary  not  to  go  aneaa  ana  reduce  tne  secono  F to  a 
i.  Instead  we  must  postpone  tnat  until  reuucing  tne  c to  an 
F and  reducing  tne  F * F to  a single  F,  wnich  can  then  be 
reduced  to  a T and  tne  T + 1 reduced  to  a single  n . 

bo  one  Finds  tnat  when  one  is  attempting  to  ao  an 
analysis  tnere  is  a fundamental  nondeterminism  tnat  must  be 
provided  for.  One  cores  to  a place  wnere  a rule  could  be 
applied,  but  doesn't  know  wnetner  that's  tne  right  piace  to 
make  the  reduction  or  not.  It  is  necessary  to  consider  botn 
alternatives,  follow  out  tne  rest  of  tne  analysis,  ana  see 
wnich  one  (if  any)  ol  tne  alternative  sequences  of  cnoices 
will  give  a complete  parse  tree.  Inis  is  tne  tirst  of 
several  sources  ol  nonaeterminism  in  botn  text  and  speech 
parsing  . 


wondeterministic  Algorithms 

lnere  are  many  applications  in  computer  science, 
especially  in  artificial  intelligence  ana  language 
processing,  wnere  systematic  searcn  in  a space  of 
alternative  possible  cnoice  is  required.  A conceptual 
device  for  devising  algorithms  for  such  tasks  is  tne  notion 
ol  a nondeteriainistic  algorithm  or  nonoeterruini  st  ic  machine. 
By  tnis,  we  do  not  reler  to  an  aigoritnm  wnose  Denavior  is 
unpred ictaoie , but  ratner  to  an  aostract  aigoritnm  in  wnicn 
there  is  a primitive  cnoice  operation  wnicn  can  make  one  ol 
several  cnoices.  inis  algorithm  is  tnen  simulated  on  a real 
mac  nine  by  systematically  considering  all  possible  sequences 
of  alternative  cnoices  ol  tne  abstract  nondeterministic 
algorithm.  ine  nonaetermin is  tic  machine  is  a conceptual 
device  to  enaDle  the  writer  ol  a grammar  or  otner  sucn 
searcn  aigoritnm  to  tnink  of  tne  mac  nine  as  if  it  were 
magically  making  tne  rignt  cnoices,  freeing  nim  from 
explicitly  keeping  track  of  tne  alternative  cnoices  and 
cycling  througn  tnem.  one  says  tnat  a string  is  accepted  by 
a nondeterministic  aigoritnm  if  any  of  tne  alternative 
computation  patns  lead  to  a successful  analysis. 

ine  first  fundamental  idea  tnat  I would  like  you  to 
remember  is  tnis  notion  of  a nondeterministic  aigoritnm  as  a 
device  for  coping  witn  tnis  type  ol  searcn  in  a space  of 
alternative  possibilities. 
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backtracking  vs  Parallel  Search 

There  are  two  principal  ways  of  writing  simulators  for 
nondeterministic  programs  --  one  is  called  backtracKing  and 
its  effect  is  that  whenever  the  program  is  about  to  make  a 
choice,  it  saves  somewhere  (usually  on  a pushdown  stack)  all 
of  the  information  tnat  is  about  to  be  destroyed  by  the 
choice  so  tnat  the  simulator  can  come  back  later,  undo  it, 
and  try  anotner  choice.  The  program  then  parses  like  a 
deterministic  parser  until  it  encounters  a blocked 
configuration  such  as  tne  one  in  Figure  7,  at  which  point  it 
undoes  tne  last  cnoice  made  and  tries  the  next  possible 
alternative.  If  there  is  no  other  alternate  cnoice,  tnen  it 
undoes  tne  next  to  last  cnoice,  and  so  on  until  all  possible 
choice  sequences  have  been  considered.  Floyd  (1967)  gives 
an  efficient  general  technique  for  implementing  backtracking 
simulators  for  nondeterministic  algorithms.  Ir.  the  case  of 
Figure  7,  the  result  of  bacKtracking  woulu  be  to  unde  the 
last  reduction  of  r to  T.  finding  notning  else  to  ao , tne 
parser  then  would  unao  the  reduction  of  c to  F,  tnen  unao 
tne  reduction  of  i ♦ T to  t,  and  eventually  would  back  up  to 
tne  point  wnere  tne  b nad  been  reduced  to  F,  out  tnat  F had 
not  been  reduced  to  T.  ine  parser  could  tnen  go  on  to 
reduce  tne  c to  an  F (a  second  time  --  tnis  was  none  before 
on  the  blockeo  patn)  and  tnen  reduce  tne  F * F to  i,  wnich 
puts  us  on  the  right  patn  for  tne  correct  analysis. 

a oacktracking  algorithm  does  its  searen  by 
systematically  working  on  one  patn  of  tne  nonaeterministic 
algorithm,  saving  enougn  to  unao  it  later.  ine  systematic 
way  in  whicn  it  walxs  tnrough  tne  space  of  possible  cnoices 
is  called  "depth  first".  Tnat  is,  after  making  one  cnoice, 
it  proceeds  to  make  a cnoice  tnat  oepenas  on  that  one,  and 
anotner  tnat  depenas  on  tnat,  ana  so  on,  builaing  up  a stacx 
of  other  untried  alternatives  at  different  "aeptns".  only 
when  it  encounters  a blockea  configuration  does  it  undo  the 
most  recently  made  choice,  ana  it  tries  all  possible  cnoices 
at  tnat  "aepth"  before  backing  up  to  tne  next  previous  level 
on  tne  stack  of  alternatives.  If  the  space  of  alternative 
choice  sequences  were  laid  out  as  a tree,  tne  backtracking 
search  would  correspond  to  a left-first  tree  walk. 

Another  way  of  nanaling  nonaeterminism  is  by  what  I'll 
call  independent  alternatives.  In  such  a program,  every 
time  tnat  you  are  about  to  make  a cnoice,  you  ci eate  an 
object  for  eacn  of  tne  possiole  cno  ;s.  Tnis  object 
corresponds  to  a state  or  configuration  o.  tne  hypothetical 
nonaeterministic  macnine  wnicn  you  are  simulating.  In  a 
real  macnine,  a configuration  is  oa3ically  the  contents  of 
the  program  counter  and  the  register  contents;  in  the 
simulation  of  a nondeterministic  macnine  there  are  many  such 
configurations  instead  of  just  one.  (This  is  similar  to 
wnat  goes  on  in  a time  sharing  system.)  For  a 
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nonaeterminist ic  finite  state  macnine,  tne  coni iguration  is 
basically  tne  state  tnat  you  are  in  ariu  tne  point  in  tne 
input  string  tnat  you  nave  gotten  to.  in  programming  a 
system  Tor  hanuling  inaependent  alternatives,  every  time 
tnat  you  come  to  a cnoice  point,  you  make  up  as  many 
configurations  as  there  are  alternative  cnoices,  and  you  are 
now  free  to  work  on  those  configurations  all  in  parallel  (or 
"breaatn  first")  or  you  can  jump  around  from  one  to  anotner 
(working  on  the  ones  that  seem  most  likely  of  success  before 
working  on  others  for  example).  witn  multiple  independent 
alternatives,  you  can  pick  up  a configuration,  aetermine  its 
state,  look  at  where  it  is  in  tne  input,  compute  tne  next 
configurations  which  it  could  get  to,  ana  then  go  to  another 
configuration  (which  may  or  may  not  De  one  ol  the  ones  you 
just  created).  Just  like  a time-snaring  system  you  can  run 
a lot  of  these  configurations  in  pseudo  parallel  with 
varying  priorities  for  service. 

There  is  a tremendous  advantage  for  speecn 
understanding  ana  aiso  in  text  parsing  for  implementing 
nond e t erin i ri  i s t ic  programs  in  terms  cf  independent 
alternatives  ratner  tnan  backtracking.  with  independent 
alternatives,  it  you  are  in  a position  wnere  it  is  difficult 
to  decide  wnicn  of  tne  alternative  cnoices  is  tne  Dest  to 
loliow,  it  is  possible  lor  you  to  follow  several  parsings  in 
parallel,  or  to  jump  lrom  one  to  anotner  q<  penuing  on  wnicn 
iooks  better  at  any  given  moment.  in  tne  backtracking 
approach,  one  nas  to  systematically  walk  uown  a long  patn 
into  barren  territory  before  ne  can  walk  back  to  the  place 
wnere  tne  next  oest  cnoice  is.  ir.e  onxy  way  to  go  back  ana 
consider  one  cl  tne  alternatives  to  a cnoice  is  to  plow  on 
ar.eaa  to  completely  searcti  tne  space  on  tne  current  patn 
exnaustively  anu  tnen  back  up  out  of  it.  once  one  nas  left 
a given  patn  it  is  not  possible  to  come  back  to  it  ana  pusn 
it  further.  oven  in  tne  simple  illustration  o.  backtracking 
for  tne  example  in  figure  7,  tnere  were  two  or  three  things 
of  an  unimaginative  nature  tnat  naa  to  be  undone  before 
getting  oack  to  wnere  tne  rignt  alternative  cnoice  naa  to  be 
maae.  In  more  complicateu  examples,  tne  amount  ol  such 
"wasted"  or  uninteresting  parts  of  tne  space  tnat  nave  to  be 
searenea  oeiore  one  can  get  back  to  tne  correct  place  to 
make  an  alternative  cnoice  can  oe  astronomical. 

1 will  make  a pitch  tnen  for  a second  fundamental  idea 
wnicn  you  snouia  know  about  --  namely  tnis  difference 
between  systematic  backtracking  ana  the  following  o' 
multiple  independent  alternatives. 
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bottom  Up,  Top  Down,  Freaictive,  and  Nonpreuictive  Parsing 

The  algorithm  that  we  described  above  for  finding  a 
derivation  of  a given  string  by  reversing  the  generative 
rules  of  the  grammar  is  an  algorithm  that  is  referred  to  as 
"bottom-up".  That  is,  we  look  into  the  input  string  or  the 
current  working  string  until  we  find  sometning  that  matches 
the  right-hand  side  of  sorae  grammar  rule,  ana  tnen  reduce 
tnat  matching  portion  by  replacing  it  witn  tiie  left-hand 
side  of  the  rule.  (I'm  assuming  a context  free  grammar  here 
lor  simplicity.)  we  apply  tnis  process  over  ana  over  again 
until  we  finally  reduce  tne  entire  string  to  a single 
.symbol.  (At  least  the  goal  we  are  trying  to  achieve  is  such 
a reduction  of  tne  string  into  a single  symbol.)  notice  that 
in  the  statement  we  nave  just  made,  we  nave  not  specifically 
mentioned  the  systematic  consideration  of  each  of  the 
possible  rules  that  could  nave  applied  at  each  step  and  the 
different  positions  in  the  working  string  wnere  rules  could 
have  been  applied.  It  is  exactly  tris  freedom  from 
consideration  of  detail  that  is  acnieved  by  tninxing  of  tne 
process  as  a nonde t ermin is t ic  algorithm.  uf  course  the 
aetaiis  need  to  be  considered  eventually  in  oraer  to  make 
the  algorithm  function  on  a real  macnine,  but  tnese 
considerations  can  be  made  separately  aria  tney  can  be  done 
once  and  for  all  for  a parsing  system  ana  not  have  to  oe 
reaone  separately  for  each  grammar  or  version  oi  a grammar 
which  is  written. 

There  is  another  xina  of  parsing  algorithm  at  tne  other 
extreme  which  is  called  "top-down".  it  gets  tnis  name 
because  it  starts  by  expanding  the  grammar  rules  "from  the 
top"  ana  only  looks  for  comparison  at  tne  input  string  wnen 
a terminal  symbol  appears  in  the  expansion.  A simple 
version  of  a top-uown  parser  makes  use  of  a pusndown  store 
into  which  tie  initial  symbol  of  the  grammar  is  placed 
before  pa  mg  begins.  Subsequently  tne  algorithm  proceeds 
as  follows:  If  tne  topmost  symbol  on  tne  stack  is  a 
nonterminal,  tnen  some  rule  of  tne  grammar  witn  that 
nonterminal  as  it3  leit-nanu  side  is  selected  (anotner 
nondetermin  ist  ic  choice)  and  the  t jpmost  svm.-1  - " trie 
pusm  own  stack  is  replaced  witn  symboxs  from  tne  rignt-hana 
side  of  tne  rule  (so  the  leftmost  symbol  of  tne  rignt-nana 
side  is  now  tne  topmost  symbol  of  tne  stack).  If  the 
topmost  symbol  of  the  stack  is  a terminal  symbol,  then  it  is 
compared  witn  tne  next  unused  symbol  of  tne  input  string. 
If  they  are  the  same  tnen  tne  topmost  symbol  of  the  stacw  is 
removed  ano  tne  string  is  advanced.  If  tney  ao  not  match 
then  this  configuration  is  blocked  --  i.e.  this  path  oi  the 
nonue termin ist ic  searen  is  terminated.  The  string  is 
accepted  if  tne  pushdown  stack  becomes  empty  at  tne  same 
time  that  tne  last  symbol  of  tne  input  string  is  used, 
fwote  again  our  use  of  tne  nonaeterministic  algoritnm  to 
simplify  tne  explanation.  In  an  actual  parsing  algorithm, 
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all  possible  cnoices  of  expanding  tne  topmost  nonterminal  of 
tne  stack  are  pursued  and  tne  string  is  accepted  if  any  ol 
the  alternative  computation  paths  leads  to  tne  accepting 
criterion.)  An  example  of  a top-down  analysis  using  a 
pusndown  store  is  shown  in  h'igure  d.  (here  the  rectangular 
enclosure  represents  tne  pusndown  store,  the  arrows  the 
steps  in  tne  analysis,  ana  Vie  plus  sign  indicates  tne 
consumption  of  a symbol  from  the  input  string  by  a given 
stack  configuration.) 


+ BIT 


DOG 


ACCEPT  SENTENCE 


r’igure  d:  A oaraple  iop-aown  Predictive  Analysis  using 

a Pusndown  Store 


Tne  narvara  Predictive  Analyzer 

The  original  Harvard  Predictive  Analyzer  (Kuno  and 
uettinger,  1963)  does  a siigntly  more  optimized  version  ol 
tne  top-uown  tec  unique  just  uescribea.  it  works  witn  a 
grammar  wnicn  has  been  transformed  so  that  all  01  its  rules 
nave  a terminal  symbol  as  tne  first  symbol  oi  their 

rignt-nand  sides.  inus  at  every  step  ol  tne  pusndown  store 
analysis  tne  algoritnm  consumes  a symbol  irom  tne  input 
string,  ana  tne  number  ol  steps  in  a given  computation  patn 
ol  tne  none e t e rm i n i s t i c machine  is  at  most  n,  where  n is  tne 

iengtn  of  tne  input  string.  (uf  course  tne  number  ol  steps 
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of  tne  real  computer  in  simulating  tne  noriueterministic 
algorithm  is  much  greater  since  it  has  to  follow  out  all 
possible  alternative  computation  patns.)  An  additional 
advantage  of  the  special  form  of  tne  context  free  rules  used 
by  the  predictive  analyzer  (known  as  Greibach  normal  form, 
or  standard  form)  is  that  it  eliminates  tne  possibi  ty  of 
infinite  loops  due  to  the  symbol  on  top  of  the  pi  hdown 
stack  expanding  into  a string  which  eventually  results  in  a 
new  instance  of  tne  same  symbol  on  top  of  trie  stack  without 
advancing  tne  input  string.  An  algorithm  due  to  Greibach 
(Greibach,  19o7)  which  converts  an  arbitrary  context  free 
grammar  into  a standard  form  grammar  finds  and  eliminates 
the  possibility  of  such  "left-recursion". 


Predictive  vs  wonpredictive  Parsing 

inere  nas  been  a great  deal  of  discussion  in  the 
parsing  literature  about  tne  differences  between  top-down 
ana  bottom-up  algorithms.  An  example  is  a paper  by 
Griffiths  ana  retrick  (1965;  which  characterizes  several 
varieties  of  eacn  type.  however,  in  recent  years  there  nave 
been  a number  of  parsing  algorithms  developed  wnich  don't 
fit  into  either  of  tnese  broaa  categories,  and  1 tnink  that 
tne  classical  distinction  between  top-down  and  ootton  up  is 
becoming  very  fuzzy.  Tne  distinction  wnicn  1 tnink  is  more 
importar..  --  a distinction  wnicn  is  correlated  witn  the 
top-down  bottom-up  distinction  for  the  two  simple  algorithms 
presented  --  is  tne  distinction  between  predictive  and 
nonpreaict ive  parsing.  A predictive  parser  is  one  tnat  will 
only  Iook  at  a given  point  in  the  input  string  for  things  of 
a sort  that  it  expects  to  see  there,  whereas  a nonpredictive 
parser  will  find  a given  construction  only  as  a function  of 
the  constituents  wnicn  make  it  up,  irrespective  of  wnether 
such  a constituent,  is  compatible  with  an  analysis  of  the 
symbols  on  either  side  of  it  in  the  input  string.  For 
example,  an  inherent  feature  of  tne  top-down  pushdown  store 
algorithm  whicn  1 presentee  above  is  tnat  at  eacn  point  in 
tne  analysis  there  exists  on  the  stack  a prediction  of  the 
types  of  pnrases  wnicn  are  expected  to  occur  to  the  rignt  of 
the  current  point  in  tne  input  string.  As  the  algorithm 
operates,  only  those  constituents  will  be  looked  for. 
tontr,  st  this  witn  tne  situation  in  the  simple  Dottom-up 
algoritnm.  There,  if  tne  terminal  symbols  could  be  grouped 
togetner  to  form  some  constituent,  then  tnat  alternative 
would  be  tried  regardless  of  wnetner  there  is  an  analysis  of 
tne  symbols  to  tne  left  witn  wnicn  this  constituent  could 
combine . 

Ine  predictive  parsing  teennique  nas  an  advantage  for 
most  parsing  applications  since  it  considerably  reduces  the 
numoer  of  applications  of  rules  that  nave  to  be  considered 
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ana  the  number  of  "accidental"  constituents  tint  are  lound 
(i.e.  sequences  of  werus  tnat  could  make  up  a constituent 
in  some  other  context  but  wnicn  are  not  a constituent  ol  any 
complete  analysis  of  tne  current  string).  ror  example,  in  a 
predictive  analysis  ol  "tne  man  bit  tne  uog"  , using  the 
grammar  ol'  figure  1,  the  parser  Iooks  l'or  a noun  phrase  at 
tne  beginning  ol'  tne  sentence  because  the  grammar  3ays  tnat 
sentences  can  Degin  with  noun  phrases.  However,  once  it  nas 
lound  tne  subject  noun  phrase,  it  doesn't  try  to  loon  lor  a 
noun  pnrase  at  tne  place  tnat  starts  with  "Dit"  because 
there  is  no  grammar  rule  wnicn  would  use  a noun  pnrase  at 
that  point.  in  tne  bottom-up  approach,  all  rules  are 
attempted  overywnere  since  there  is  no  prediction.  Wot  only 
does  tnis  result  in  more  rules  tnat  have  to  be  tried,  out  it 
also  results  in  more  spurious  matches  tnat  don't  lead  to 
correct  parsings. 

ror  parsing  text  in  tne  lorm  oi  sequences  of  woras, 
tnere  is  a great  advantage  to  using  tne  predictive  algorithm 
Decause  it  toiicws  fewer  blind  alleys.  on  tne  otner  nand  , 
tnere  is  a problem  in  continuous  speech  understanding  wnicn 
reduces  its  advantage.  In  continuous  speech  understanding, 
tnere  is  a fairly  high  probability  tnat  your  guess  for  tne 
word  at  any  given  point  in  tne  string  may  be  wrong.  ir.is  is 
especially  true  of  tne  first  ano  last  word  in  tne  sentence 
due  to  pnonoLogical  effects  at  the  beginning  ana  enas  ol 
utterances.  .1  your  guess  ol  tne  lirst  woru  is  wrong,  then 
all  ol  your  predictions  later  wili  oe  influenced  by  it,  ana 
if  it  induces  you  to  oniy  look  for  tnose  tnings  tnat  will  be 
consistent  with  tnat  wrong  woru,  tnen  you  may  never  recover 
tne  right  parse  The  nonpreu ic t i ve  parser  tnat  goes  up  and 
oowr.  tne  string  doing  everything  it  can  stanus  a better 
cnance  of  recovering  from  suen  errors.  Specifically,  it 
stands  a better  cnance  ol  linding  most  ol  tne  parse  in  spite 
of  a wrong  or  missing  word.  it  can  tnen  provide  tnis 
information  as  a source  for  prediction  as  to  wnat  tne 
missing  worn  might  oe  or  wnat  Kina  ol  wora  is  required  in  a 
given  region. 

Another  point  tnen  tnat  I would  like  to  make  is  tnis 
traced!  between  predictive  ano  nonpreaictive  parsing 
algorithms  lor  speech  understanding.  1 don't  want  to  wake  a 
strong  case  tnat  one  or  tne  other  is  better;  1 want  to  give 
a leeiing  lor  what  tne  traueolls  are  between  tne  two 
algorithms.  Tne  predictive  one  will  ao  a mere  selective 
searen,  ana  ll  one  is  conliaent  tnat  tne  things  on  wnicn  it 
is  casing  its  predictions  are  right,  tnen  it  is  preferable, 
on  tne  otner  nand,  ii  tnere '3  a nign  cnance  tnat  tney're 
wrong,  tnen  the  disadvantage  is  tnat  tne  prediction  may  Keep 
you  lrom  finding  enough  01  the  correct  parse  to  be  a useiul 
source  ol  information  for  error  correction. 
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well-formed  Substring  Tables 

une  thing  that  was  lound  very  early  in  the  develoDment 

rL  lng  alg°rithins,  especially  with  the  enumerative 
top-down,  predictive  algorithms  is  that  uhpn  . . ’ 

computation  paths  are  done  separately,  duplicated"* k is 
done  on  the  separate  paths.  For  example,  ii  ^o  possible 
ways  or  analyzing  the  beginning  or  a sentence  cause  the 
analysis  to  split  up  into  two  different  computations  rhe 
entire  remaining  analysis  will  be  done  twice,  even  tnoirh 
may  be  tne  same  in  botn  cases.  A 11  well-fnnm^H  f * 

taoie"  is  a mechanism  for  saving  the  results  of  the^naljsis 
ol  a constituent  on  one  path  nr  * ‘ analysis 

computation  so  tnat  they  can^use^  on^the"  " ^ hT^u^ojt 
redoing  tne  computation.  whenever  in  the  conn.!  , °Ut 

analysis,  a complete  constituent  is  round  it  ^ ”cord°ed  In 
a table  incexed  py  tne  type  or  constituent  and  the  oosilinn 
where  it  begins.  whenever  tne  algoritnm  is  about  to  preoict 
a c^stttuent  of  a given  type  at  a given  position  U 
consults  tne  weli-io-meo  substring  taole  to  see  if  such  a 
constituent  has  already  been  found,  anu  it  so  L h 
results  are  used  without  recomputation.  * 

Table  ur'.entea  Parsing  Algorithms 

<■  ,filne  01  tne  we  1 1 - formed  substring  table  is 

sufficiently  useful  tnat  some  parsing  algoritnm'?  hauo 
designed  exclusively  around  tnat  “of io^ 

purpose  is  to  till  in  this  table  witn  entries  saying  mere 
constituent  of  type  x from  position  y to  position  z in 
tne  input.  Tneir  acceptance  criterion  for  a string  i •? 
tinning  in  tne  table  an  entry  indicating  a constituent  of 
type  "sentence"  iron,  tne  beginning  to  tne  mu  of  me  L.put 
sequence.  In  tne  design  oi  sucn  an  algoritnm,  one  looks 
a strategy  lor  rilling  i„  t„e  table  so  tnat  wneneSer  in 
applying  a grammar  rule,  one  neeus  tne  answer  to  a question 
l-  mere  an  x trom  ••  to  z",  tne  strategy  win  alreaov  n ue 
consioerec  all  possible  w..ys  of  filling  tnat  entry  i„  tne 
taole,  and  the  answer  can  be  oetermined  by  s im pi y * exami n i ng 
tne  content  of  tne  Lx,y,zj  entry  of  tne  taole  Inf 
resulting  algoritnm  consists  mainly  or  walking  tnis'raatrix 
in  an  appropriate  order  and  filling  in  entries  In  tne  basil 

fLm°!ner  e"tries  ana  tne  symoois  in  tne  input  string.  for 
example  an  algoritnm  due  to  lounger  (1966)  lill-  in  thP 

entries  in  order  of  lengtn  of  tne  resulting  constituent  (and 
1 or o..as  grammar  rules  whose  right-nand  sioes  consist  of  a 
single  nonterminal).  Since  tne  lengtns  of  tne  constituents 
wnicn  match  tne  rignt-nana  side  of  a rule  will  ue  less  than 
the  length  oi  tne  constituent  tnat  will  result  all  tne 
necessary  table  entries  for  constituents  of  a given 

reduction  will  already  have  oeen  made  when  tnat  reduction  is 
considered . Thus  »„e„  filling  in  tne  table  for  constituent 
of  lengtn  3 for  example,  all  of  tne  entries  for  constituents 
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ol'  length  2 and  1 will  already  have  been  mate  ana  any 
questions  about  tne  existence  of  such  constituents  can  be 
answered  by  merely  consulting  the  table.  The  constituents 
of  length  1 are  found  by  matching  singleton  terminal  rules 
against  the  input  string.  when  such  an  algorithm 
terminates,  if  there  is  an  entry  for  tne  initial  symbol  from 
the  beginning  to  tne  end  of  tne  input  string,  then  the 
string  is  accepted  by  tne  parser,  otherwise  it  is  rejected. 


eliminating  heuundancy 

In  tne  auove  type  ol  algoritnm,  it  is  critical  in  order 
not  to  qo  a lot  ol  excessive  computations  that  a particular 
order  of  tilling  in  tne  taule  oe  used.  This  is  so  that  one 
can  reiy  on  any  answer  tnat  is  needed  having  been  put  there 
at  an  earlier  point  in  tne  sequence.  This  has  many 
efficiency  auvantages  for  ordinary  text  parsing.  However, 
it  nas  a disadvantage  for  speech  understanding  applications, 
since  one  ol  the  critical  elements  early  in  tne  cnain  may  be 
misheard  or  garbled  and  thereby  Keep  tne  rest  of  the 
analysis  from  being  found  («nicn  could  De  used  to  nelp 
identify  tne  garoled  woru).  inis  same  disadvantage  applies 
to  tne  i.eft-1'irst  canonical  derivation  ol  a parse  which  we 
mentioned  earlier,  ana  to  any  other  parsing  technique  wnich 
requires  tne  individual  steps  in  an  analysis  to  be  found  in 
a particular  canonical  oruer.  If  one  of  tne  critical  things 
tnat  nas  to  oe  found  first  in  some  such  ordering  is  wrong 
and  il  ail  of  tne  subsequent  processing  is  rependent  on  it, 
then  it  will  be  very  oiflicuit  to  recover  from  the  error.  1 
thinK  tnerelore,  tnat  it  is  important  lor  speech 
understanding  to  try  to  relax  some  ol  these  ordering 
restrictions.  This  is  a fundamental  departure  from  tne  way 
tnat  most  text  parsing  systems  operate  ana  it  is  going  to 
require  a different  solution  to  tne  prooiem  ol  linaing  tne 
same  parse  over  and  over  again  in  ail lerent  vays. 

In  many  cases,  it  may  be  important  to  be  able  to  jump 
over  ana  find  tne  ooject  noun  pnrase  ana  tnen  tne  verb 
pnrase  when  you  haven't  iound  tr.e  subject  yet.  for  example, 
in  tnose  cases  wnere  the  subject  wasn't  linuacie  because  of 
a garbled  word,  a well  unaerstoou  verb  phrase  could  be  used 
to  predict  wnat  Kina  of  suDject  ought  to  be  tnere.  nowever, 
in  other  nses  wnen  you  have  found  tne  suoject  first  on  one 
path,  a computation  path  wnich  finds  tne  vero  phrase  and 
tnen  comes  back  and  works  on  the  subject  wiil  lind  tne  same 
parsing  over  again.  The  solution  tnat  we  have  been  using  in 
tne  bbN  system  (woods,  1974)  --  tne  solution  wnich  I tninK 
nas  to  be  used  --  is  to  put  in  appropriate  checks  at  various 
cnoice  points  to  ask  whether  tne  tning  tnat  is  about  to  be 
produced  nas  been  founu  already  on  some  otner  patn  ana  avoid 
creating  a duplicate.  when  this  is  done  at  tne  level  ol 


20 


bbN  He  port  Ho.  3067 


bolt  beranek  ana  Newman  Inc. 


noun  ph rases,  embedded  clauses,  etc.,  it  ten as  to  block  the 
redundant  generation  o f larger  constituents  before  the 
duplication  becomes  un  lanageable.  It  still  carries  witn  it 
the  cost  of  trie  additional  checking,  but  I tnink  that  this 
cost  is  essential  in  order  to  cope  witn  the  errors  that  will 
occur  in  speech . 


Lexical  Ambiguity 

I've  mentioned  a number  of  things  wnicn  make  the 
parsing  problem  fir  speecn  understanding  more  difficult  tnan 
traditional  text  parsing.  Another  difficulty  is  the 
amoiguity  of  word  identification  in  the  input  sequence  of 
sounds.  Tne  major  source  of  lexical  ambiguity  in  text 
parsing  is  tne  possibility  of  multiple  syntactic  categories 
for  a given  wore.  In  a classical  example  ol  sentential 
amDiguity,  "Time  flies  like  an  arrow,"  the  word  "time"  has 
three  possible  s’ntactic  categories  (noun,  verb,  or 
adjective),  "flies"  can  either  be  a verb  or  a noun,  and 
"like"  can  either  be  a preposition  or  a verb.  If  we  think 
of  a parser  receiving  a sequence  ol  uiese  kinds  of 
categories  as  input,  there  would  be  3x2x2=12  strings  of 
syntactic  categories  that  you  could  geo  lor  this  sentence. 
If  you  had  to  put  eacn  such  sequence  througn  tne  parser 
separately  (apparently  some  early  parsers  did  exactly  that) 
you  would  be  doing  twelve  separate  parsings.  Imagine  what 
would  happen  with  a sentence  of  say  20  words  with  an  average 
ambiguity  of  2 categories  per  word;  you  would  have  over 
1,000,000  different  possible  such  sequences.  In  speech 
understanding,  this  basic  ambiguity  is  magnified  by  the 
inability  to  unambiguously  determine  tne  segmentation  of 
speecn  sounds  into  word  sequences.  Clearly  one  doesn't  want 
to  run  a parser  on  a separate  enumeration  of  each  possible 
sequence  ol'  syntactic  categories. 


word  Lattices 

A tecnnique  that  nas  been  very  eflective  for  dealing 
with  lexical  amoiguity  has  been  the  use  of  a lattice  of 
input  symbols  rather  tnan  a single  string.  A simple  example 
of  such  a structure  is  illustrated  in  figure  9. 
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TIME  FLIES  LIKE  AN  ARROW 


Figure  9:  A hample  wore  Lattice 


suen  lattice  compactly  represents  all  of  tne  possible 
alternative  sequences  of  input  symbols  with  tne  common  parts 
of’  different  sequences  factored  together  so  that  processing 
on  tnem  needs  to  be  none  only  once.  with  such  an  input, 
grammar  rules  are  matched  tne  same  as  beiore,  except  that  as 
a rule  is  matened  against  the  input,  particular  paths  are 
selecteo  tnrough  tne  wora  lattice  which  satisfy  tne  match. 
This  technique  has  a tremendous  oer.efit  in  terms  of  tne 
amount  ot  computation  repaired  for  parsing.  when  a 
particular  rule  is  matched  at  a given  point  ia  tne  word 
lattice,  all  of  tne  possible  sequences  of  words  in  which  tne 
matching  sequence  occurs  are  effectively  factored  togetne  • 
so  that  the  result  of  tne  reduction  is  effectively  performed 
just  once  tor  an  entire  equivalence  class  of  word  sequences, 
this  technique  is  very  attractive  for  speech  understanding 
oecause  tne  possible  alternative  segmentations  of  the  input 
signal  into  words  leads  to  a lattice  structure  similar  to 
tnat  illustrated  in  Figure  9 (altncugn  of  slightly  more 
varied  structure).  whereas  tne  structure  in  Figure  9 is 
nothing  more  tnan  a sequence  ot  alternative  syntactic 
categories,  tne  structures  for  word  lattices  in  speech 
understanding  tend  to  nave  muen  more  branening,  and  the 
individual  branenes  leaving  a given  point  do  not  ail  come 
together  again  at  tne  same  point.  however,  tne  same  parsing 
a.lgorithm  runs  on  this  more  generalized  input  lattice  and 
saves  a tremendous  amount  of  processing  by  avoiding  the 
multiplication  of  combinatorial  possibilities. 
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Chart  Parsers 
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the  use  ‘“h1?"  • ‘'°r  the  ^ and 

the  intermediate  stages  of  nan  String  table  for  representing 
can  he  combined i*to  a si™! “®?r' B bl°s reia, ted  , and 
algorithm . The  structure  of  the  well  L1!"'!"''*  ln  a Parsi"e 
ia  exactly  the  same  as  that  of  u i TI»!ril!  table 
is  appropriately  jr.dexed  nv  nn«iH  . 0rd  i-*ttice,  and  if  it 

position  in  tne  w,  rd  lattice?  t^en  Jt^ha  lnpUt  String  (or 
compactly  enumerating  for  a g<ven  posltio^  f*:0perty  of 
constituents  which  bez:n  ? P°sltlon  ail  or  the 

positions  where  suSh  ^sUt“„t  "!  d^  ^a“  “ ?“ 

algorithm  known  as  Cocke's  akoritnm  ( classical  parsing 

generalization  of  that  oy  MarJin  Kay  ihich  19°2)  an<1  3 

chart  parser  ( 1967  ) nnmhi  wfllch  KaY  now  calls  a 

with  a lattice  T.H  (.  ro°“b  bs\  “ “V' l»f«  «■••»«. 

expand**!  he  ^uHl  LS'o,’1'  g°al  " ««  • i^U 

lattice  (or  chart;  of  all  ot^tne  symbols  lnt0  3 complete 
found  in  any  analysis  of  an  path  in  Ca"  66 

An  example  of  such  a latt-in^  r t tn  initial  lattice. 

lihe  an  arrow  is  shown  ^“^re  "I1“a 
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TIME  FLIES  LIKE  AN  ARROW 


figure  10:  An  example  oi  a weli-formeu  Substring  Lattice  or  Cnart 


tacn  labeled  norizontai  line  in  tne  figure  between  vertical 
strikes  represents  a segment  added  to  tne  cnart  as  a result 
of  tne  application  ot  some  rule  (or  one  of  tne  initial 
entries  in  tne  wore  lattice).  ootn  of  tnese  parsing 
algorithms  (Kay's  and  Locke's)  seiect  a particular  order  for 
walking  tne  chart  and  adding  new  segments  as  a result  oi 
matching  rules  against  tne  segments  already  in  tne  chart, 
and  botn  produce  a very  nice  recognition  aigoritnm  t nat 
keeps  a great  deal  of  tne  common  parts  of  different  analyses 
merged  together.  The  principal  difference  between  Kay's 
parser  and  Locke's  is  Kay's  generalization  ol  the  raetnod  to 
handle  general  rewriting  systems  anc  an  approximation  to 
transformationa1  grammars.  r’or  strictly  context  free 
grammars,  both  algoritnms  are  eflectively  tne  same.  In  this 
paper,  1 will  call  all  such  parsers  (botn  Cocke's  ana  Kay's) 
and  their  derivatives  "cnart  parsers".  In  particular,  the 
usual  implementation  ol  tne  classical  nonpredictive 
bottom-up  parsing  aigoritnm  is  a cnart  parser. 
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Parsing  versus  Recognition 

In  order  to  be  called  a parser,  an  algorithm  must  not 
only  calculate  whetner  a string  is  accepted  or  not,  as  does 
a recognizer,  but  it  must  also  keep  a record  ol  the 
derivation  and  provide  one  or  more  structural  analyses  ol' 
tne  sentence.  In  my  description  of  most  of  the  parsing 
algorithms  so  far,  1 have  glossed  over  this  distinction  and 
only  the  recognition  aspects  have  been  discussed.  In  order 
to  be  a parser,  an  algorithm  must  Keep  track  of  and  report 
what  constituents  were  used  as  pieces  of  what  higner 
constituents.  This  can  be  done  conveniently  for  a chart 
parser  by  annotating  each  of  the  segments  of  the  chart  with 
a list  of  the  constituents  which  formed  it,  --  tnat  is,  by  a 
list  of  the  segments  which  were  combined  by  some  rule  to 
produce  tne  annotated  segment.  In  general,  there  can  be 
several  ways  to  form  a given  segment  from  different 
sequences  of  constituents  so  the  annotation  must  provide  for 
several  such  constituent  lists  in  order  to  represent  all 
possible  analyses. 

both  Cocxe's  algorithm  and  Kay's  are  bottom-up, 
nonpredictive  algor itnras  ana  share  with  other  such 
algorithms  the  property  of  finding  many  accidental 
constituents  that  do  not  form  a part  of  any  complete 
analysis.  buch  accidental  constituents  clutter  up  the 
chart.  figure  11  snows  a chart  for  our  example  in  whicn  all 
such  accidental  segments  have  been  removed.  Sucn  a "cleaned 
up"  chart  together  with  its  constituent  pointers  provides  a 
very  compact  representation  ol  all  of  tne  possible 
alternative  analyses  of  the  input  with  tne  common  parts  of 
tne  different  analyses  merged  togetner.  figure  12  snows  the 
cnart  of  figure  11  witn  constituent  pointers  aaued  for  one 
particular  pa-sing  of  tne  input,  ana  figure  13  snows  the 
same  chart  with  all  constituent  pointers  included. 
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Figure  13:  A Chart  Showing  All  Constituent  Pointers  for  Two  Parsi 


In  more  typical  cases,  one  cannot  araw  as  nice  a picture  of 
the  chart  as  in  our  example,  (in  particular  it  may  not  be  a 
planar  grapn),  but  inside  a computer,  a taole  of  positions, 
each  with  a set  of  associated  segments  (indicated  by  the 
constituent  type  and  tne  position  wnere  the  segment  ends) 
suffices  to  handle  the  most  general  case  for  a recognizer. 
For  a parser,  tne  inclusion  with  each  segment  of  a list  of 
alternative  constituent  lists,  eacn  of  wnicn  is  a list  of 
segments  (where  a segment  is  named  by  its  left  and  right  end 
points  and  tne  constituent  label)  suffices  to  produce  a 
compact  representation  of  all  the  possible  parses. 


barley's  Algorithm 


There  is  another  parsing  algorithm  for  context  free 
grammars  due  to  Jay  barley  (barley,  1970),  which  can  be 
thought  ot  as  a predictive  chart  parser.  This  ai^nithm 
combines  the  benefits  of  tne  systematic,  lattice-oriented 
parsing  of  the  well-formed  substring  or  chart  parser  with 
the  advantages  of  predictive  analysis.  Although  the 
algorithm  was  developed  in  the  context  of  parsing  for 
computer  programming  languages,  and  is  presented  as  suen  by 
barley,  the  algorithm  has  many  theoretical  advantages  for 
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parsing  context  free  grammars  in  general,  and  an 
appreciation  of  its  operation  is  important  lor  the 
understanding  ol'  context  tree  parsing.  Earley's  algorithm 
does  not  quite  fit  into  either  the  top-down  or  the  bottom-up 
models,  or  rather  it  seems  to  fit  equally. well  into  both. 
Starting  from  the  beginning  of  the  string,  it  begins  to  fill 
in  a table  (which  Earley  calls  a state  table)  in  which  it 
records  for  each  position  in  the  input  string,  each  rule  of 
the  grammar  that  has  been  partially  matched  up  to  that  point 
or  which  might  possibly  match  beginning  at  that  point  (i.e. 
each  rule  that  would  be  consistent  with  what  has  been  parsed 
so  far  to  the  left  of  that  point).  The  table  is  organized 
into  columns,  one  for  each  position  in  the  input  string,  and 
the  procedure  for  filling  out  a given  column  i + 1 is  a3 
follows : 

1.  (transition)  haxe  entries  in  the  column  for  rules 
that  appear  in  the  preceding  column  and  whose  match  can  be 
continued  by  matching  the  input  symbol  associated  witn  this 
column  . 


2.  (prediction  or  "pushing”)  make  beginning  entries  in 
this  column  lor  every  constituent  wnich  could  be  used  to 
continue  a match  for  a rule  already  in  this  column  (each 
rule  remember  the  column  in  wnich  its  match  was  begun  so 
that  when  the  match  is  completed,  the  algorithm  can  return 
to  tne  column  where  the  constituent  was  wanted  ana  continue 
the  match  of  any  and  all  rules  which  wanted  it.  This  memory 
of  the  column  in  which  a subconstituent  match  was  begun 
replace?  the  use  of  a stack  in  most  predictive  algorithms, 
and  gains  o combinatorial  benefit  by  not  having  to  enumerate 
all  of  the  different  possible  stacks  which  could  sit  above  a 
given  subconstituent  computation.  A given  constituent  in  a 
given  place  will  be  looked  for  and  found  only  once.) 

3.  (completion  or  "popping")  For  each  rule  whose  match 
has  just  been  completed  in  this  column,  go  back  to  the 
column  where  that  match  was  begun  and  pick  up  and  continue 
the  match  of  all  rules  which  can  use  the  constituent  just 
formed  . 

In  Earley's  statement  of  the  algorithm  the  progress  of 
a rule  matcn  is  recorded  by  a pair  of  numbers  --  the  rule 
number  and  the  number  of  symbols  in  the  right-hand  side  of 
the  rule  wnieh  have  already  been  matched.  An  entry  in  the 
table  consists  of  these  two  numbers  plus  the  number 
indicating  the  column  :.n  which  the  rule  match  was  begun.  A 
sentence  is  accepted  if,  when  the  last  column  is  filled  out, 
it  contains  ar  entry  for  a rule  whose  left-hand  side  is  S, 
whose  match  hat  b*  ? completed,  and  wnose  match  was  begun  in 
column  0.  (The  ai,  ithm  begins  by  initializing  column  0 to 
contain  all  of  the  • .ies  wnose  l^tt-hand  sides  are  S.) 
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Earley's  algorithm  is  frequently  thought  of  as  a 
top-down  parsing  algorithm  because  of  the  way  that  it  starts 
with  the  assumption  that  it  is  going  to  build  a sentence  and 
successively  elaborates  its  set  of  rules  to  be  looked  for  by 
passing  information  "down"  in  step  2.  however,  once  such 
top-down  prediction  (which  incidentally  may  leap  over  wnat 
will  amount  to  many  cycles  of  left  recursion  in  the  final 
analysis)  has  determined  the  set  of  rules  to  be  used  at  a 
given  point,  the  subsequent  analysis  will  do  almost  the  same 
kind  of  bottom-up  structure  building  as  any  other  chart 
parser,  tne  principal  difference  being  tnat  for  barley's 
algorithm  we  will  get  a subset  of  those  entries  in  the  chart 
that  would  have  been  produced  by  an  ordinary  chart  parser. 
This  is  because  the  prediction  technique  has  eliminated  all 
those  entries  which  are  not  at  least  consistent  with  some 
analysis  of  the  string  to  the  left.  Once  again,  this 
prediction  is  a mixed  blessing  for  speech  understanding 
since  if  the  prediction  is  made  on  the  basis  of  unreliable 
evidence,  it  may  keep  us  from  finding  enough  of  an  analysis 
to  benefit  error  correction. 


Transition  Network  Grammars 

The  presentation  so  far  has  been  illustrated  by  two 
extremely  simple  sample  grammars.  when  one  oegins  to  write 
a grammar  for  any  appreciable  subset  of  natural  language, 
one  finds  that  there  are  some  verb  phrases  whicn  consist  of 
a verb  alone,  some  with  a verb  plus  an  object  noun  phrase, 
some  with  a verb,  an  indirect  object  anu  a direct  object, 
any  of  tnese  tnrce  forms  with  a prepositional  pnrase  added, 
any  of  the  three  forms  with  two  prepositional  ph.-ases,  etc. 
If  one  were  to  write  each  of  these  as  a separate  context 
free  rule,  as  illustrated  in  Figure  14a,  we  find  a very 
rapid  proliferation  (possibly  infinite)  of  rules  that  share 
a lot  of  stuff  in  tne  right-hand  sides.  People  who  write 
grammars  immediately  find  themselves  falling  into  notations 
such  as  that  illustrated  in  figure  14b  in  which  optional 
constituents,  alternative  constituent  sequences,  and 
repeatable  constituents  are  indicated  by  some  notation 
(usually  parentheses  for  optionality,  :urly  brackets  or 
vertical  strokes  for  alternative  sequences,  and  the  Kleene 
star  operator  (*)  for  repeatable  constituents).  These  are 
usually  thougnt  of  just  as  aboreviations  for  a set  of 
ordinary  context  free  rules,  but  the  actual  expansion  of 
sucn  notations  into  ordinary  context  free  rules  is  a very 
bad  way  to  implement  them.  Instead,  one  buys  an  advantage 
in  parsing  if  he  takes  advantage  of  tnese  primitive  notions 
of  optionality,  alternatives,  and  repeatability.  Transition 
network  grammars  provide  a mechanism  for  doing  this. 
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A basic  transition  network  (BTN)  is  essentially  a 
finite  state  transition  diagram  to  which  recursion  has  been 
added  by  fiat  (see  woods,  1969,  1970,  1973a)*  The  result  is 
no  longer  a finite  state  device,  but  rather  is  formally 
equivalent  to  a pushdown  store  automaton  or  a context  free 
grammar.  The  BTN  is  a labeled,  directed  graph  whose  nodes, 
which  we  call  states,  represent  states  which  the  grammar  can 
be  in  in  tne  course  of  generating  (or  analyzing)  a sentence, 
and  whose  arcs  represent  transitions  from  state  to  Si,ate. 
The  labels  on  the  arcs  indic?»e  the  input  symbol  or  type  of 
pnrase  which  must  be  consumed  from  the  input  string  in  order 
to  make  the  transition.  It  is  the  possibility  of  arcs 
(called  PUSH  arcs)  labeled  with  the  names  of  phrase 
constituents  that  provides  the  recursion  wnicn  makes  this 
model  more  than  finite  state.  The  grammar  contains  a start 
state  for  each  of  the  types  of  constituents  which  can  be 
called  for  on  a PUSH  arc,  and  d istinguishea  states  called 
final  states  wnich  represent  the  completion  of  tne  analysis 
of  some  constituent.  A PUSH  arc  can  be  taken  if  some  string 
accepted  by  the  start  state  associated  with  the  label  of 
that  push  arc  is  consumed  (or  generated).  There  is  a 
mecnanical  procedure  presented  in  woods  (1969)  for 
transforming  any  given  context  free  grammar  into  an 
equivalent  BTN  and  performing  a number  of  optimizing 
transformations  on  tne  resulting  BIN  to  produce  a grammar 
which  is  more  compact  and  more  efficient  for  parsing  than 
the  original  context  free  grammar.  bssentialiy  the  BTN 
provides  a way  to  factor  a context  free  grammar  into  a 
finite  state  part  and  a recursive  part  so  that  as  much  of 
the  grammar  as  possible  can  be  expressed  in  the  finite  state 
part  and  optimized  by  the  same  techniques  applicable  to 
finite  state  grammars. 

The  set  of  notations  used  by  linguists  for  representing 
alternative  sequences  ant  repeatable  constituents  in  their 
grammar  rules  correspond  to  the  operations  called  "union", 
and  "closure"  in  the  theory  of  finite  state  automata,  which 
together  with  the  operation  of  concatenation  are  known  to 
generate  the  finite  state  languages.  Thus,  tne  right-hand 
sides  of  grammar  rules  using  these  notations  are  merely 
notational  variants  of  what  is  in  automata  theory  called  a 
"regular  expression",  and  there  exist  formal  procedures  for 
translating  such  a representation  into  an  equivalent 
transition  diagram  for  a finite  state  macnine,  These  same 
procod ures  can  be  used  to  translate  a context  free  grammar 
using  these  notations  into  an  equivalent  BTN,  such  as  the 
one  illustrated  in  Figure  14c. 
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a.  SEPARATE  CONTEXT  FREE 
GRAMMAR  RULES 


VP-*V  (NP  (NP) ) (PP)* 

b.  MERGEO  REPRESENTATION 


c.  REPRESENTATION  AS  BASIC  TRANSITION  NETWORK  (BTN) 


Figure  14:  Alternative  Representations  lor  Multiple 
Hight-hand  Sides  or  Grammar  Rules 
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Thus  the  BTN  formalism  provides  a realization  for  these 
notions  of  alternative  sequences  and  repeatable  constituents 
that  is  more  efficient  for  a parser  as  well  as  being  less 
redundant  as  a linguistic  specification.  Each  of  the  arcs 
leaving  a given  state  represents  an  alternative  possible 
continuation  of  the  string  being  generated  (or  of  the 
analysis  of  a given  string). 

The  transition  network  grammar  effectively  provides  for 
the  merging  of  common  parts  of  what  would  be  diflerent 
context  free  rules,  and  this  permits  parsing  operations  to 
be  performed  only  once  on  sucn  parts  instead  of  separately 
on  each  individual  copy  as  would  be  •'he  case  if  the 
expressions  were  expanded  into  separate  ordinary  context 
free  rules.  Most  of  trie  parsing  algorithms  for  context  free 
grammars  have  natural  generalizations  to  transition  network 
grammars  wnicn  taxe  advantage  of  this  merging.  In 
particular,  Earley's  algorithm  is  a natural  algorithm  for 
bTN  grammars  and  the  number  of  parsing  operations  required 
by  Earley's  algorithm  for  a parsing  of  an  optimized  bTN 
grammar  compared  to  the  parsing  of  an  equivalent  context 
tree  grammar  can  easily  be  less  by  factors  of  four  or  five. 
A presentation  of  a version  of  barley's  algorithm  for  bTN's 
is  given  in  woods  ( 19  69  ) - 


Grammars  for  natural  English 

In  comparing  the  models  of  the  Chomsky  hierarchy  with 
each  other,  it  has  been  found  that  whereas  the  finite  state 
grammars  have  great  computational  advantages  for  parsing 
(there  exist  formal,  mechanical  optimizing  procedures  of 
various  types  for  finite  state  machines),  tne  absence  of 
recursion  makes  it  unsuitable  for  na'.ural  language  analysis. 
On  the  whole,  context  free  grammars  provide  the  simplest  and 
most  natural  grammars  for  natural  language  but  are  formally 
incapable  of  dealing  with  certain  kinds  of  coordinate 
constructions  and  discontinuous  constituents.  Context 
sensitive  grammars  have  sufficient  formal  power  to  provide  a 
recognizer  for  such  constructions,  but  provide  no  useful 
structural  descriptions.  General  rewriting  systems  add  no 
useful  power  not  already  present  in  context  sensitive 
grammars  and  have  tne  undesirable  consequence  that  it  is  not 
possible  to  have  a parsing  algorithm  for  the  entire  class  of 
such  grammars  . 
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Transformational  Grammars 

There  are  a nurater  ol‘  other  grammar  formalisms  tnat 
have  been  proposed  for  natural  language  which  have  been 
shown  to  ue  equivalent  to  the  ordinary  context  free  grammar 
model.  One  formalism,  however,  with  considerably  more  power 
than  context  free  grammars  has  stimulated  linguistics  and 
served  as  the  vehicle  for  most  of  the  study  of  natural 
language  grammar  in  tne  last  decade.  This  is  the 
transformational  grammar  of  Chomsky.  A transformational 
grammar  basically  consists  of  a context  free  "base"  grammar 
plus  a set  of  transformational  rules  which  c;.i  permute  the 
order  of  constituents  and  in  general  move,  delete,  and 
insert  constituents  at  various  positions  in  the  parse  tree. 
Transformational  rules  can  also  test  conditions  such  as 
identity  of  constituents  and  the  presence  of  syntactic 
features  associated  with  the  words  ana  sometimes  the  phrases 
of  the  sentence.  Perhaps  the  simplest  example  of  a 
transformational  rule  is  the  passive  transformation  shown  in 
Figure  15,  which  produces  tne  "surface  structure”  for  a 
passive  sentence  from  the  "deep  structure"  that  underlies 
the  corresponding  active  sentence. 
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PASSIVE 

NP  (AUX)  V NP 

12  3 4 

4 2 BE  + EN+3  BY  + 1 

CONDITION:  4 # 1 

a.  STATEMENT  OF  THE  RULE 


Figure  15:  A sample  Transformational  hule:  The  Passive 

Transformation 


The  rule  says  that  if  you  can  analyze  an  intermediate  phrase 
structure  tree  into  a sequence  consisting  of  a noun  phrase, 
optionally  an  auxiliary  verb,  followed  by  a main  verb  and  an 
object  noun  phrase,  then  you  can  transform  the  tree  by 
moving  the  subject  noun  phrase  (1)  to  the  position  of  the 
object  noun  phrase  (4)  appending  the  word  "by"  on  its  left, 
moving  the  object  noun  phrase  to  subject  position,  and 
appending  the  morphemes,  "be"  and  "en"  , to  the  left  of  the 
main  verb.  This  rule  changes  the  tree  structure 
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corresponding  to  "Mary  shot  Jonn"  into  that  corresponding  to 
"John  was  shot  by  Mary".  (A  later  rule  will  move  the  "en" 
to  the  right  of'  the  next  verb  and  a "post  cyclic"  rule  will 
combine  the  two  into  a past  participle.)  The  generation  of  a 
sentence  by  a transformational  grammar  consists  of  the 
generation  of  a deep  structure  tree  by  means  of  the  context 
free  base  grammar  and  then  transforming  this  tree  through  a 
series  of  intermediate  structures  into  the  surface  structure 
tree  by  means  of  the  transformational  rules,  which  are 
usually  oraered,  marked  as  optional  or  obligatory,  and 
applied  cyclically  to  successive  embedded  clauses  in  complex 
sentences  . 

The  transformational  grammar  appears  capable  of 
capturing  the  major  syntactic  facts  about  natural  language, 
and  a great  deal  of  our  current  knowledge  about  the  syntax 
of  English  nas  been  discovered  and  codified  in  terms  of  this 
model.  however,  it  is  incredibly  inefficient  to  parse  with 
such  a grammar  and  no  parsing  algorithm  suitable  for  parsing 
any  significant  amount  of  text  has  ever  been  developed  for 
this  grammar  model,  although  Stanley  Petrick  has  spent 
considerable  effort  in  tnis  direction  for  a number  of  years 
and  has  probably  tne  only  working  parsing  algorithm  for 
transformational  grammars  in  existence  (Petrick,  1965). 


Augmented  Transition  NetworxE 

In  order  to  obtain  a grammar  formalism  with  the 
linguistic  adequacy  of  a transformational  grammar  while 
preserving  the  efficiency  of  the  various  context  free 
parsing  algorithms,  I have  been  developing  and  refining  a 
model  ot  grammar  which  I call  an  augmented  transition 
network  (ATM).  Presentations  of  this  model  appear  in  woods 
(1969,  1970,  1973a).  Earlier  attempts  along  similar  lines 
were  made  by  Thorne,  Bratley,  and  Dewar  (196B)  and  by  Bobrow 
and  Eraser  ( 1 969  ).  An  AT N consists  of  a basic  transition 
network  grammar  augmented  with  a set  of  registers  whicn  are 
carried  along  with  the  state  and  which  can  hold  arbitrary 
pieces  of  tree  structure,  ana  with  arbitrary  conditions  and 
actions  associated  with  the  arcs  of  the  grammar  which  can 
test  and  set  the  contents  of  these  registers.  As  a parsing 
proceeds  with  an  ATN  grammar,  the  conditions  and  actions 
associated  with  tne  transitions  can  put  pieces  of  the  input 
string  into  registers,  use  the  contents  of  registers  to 
build  larger  structures,  check  whether  two  registers  are 
equal,  etc.  It  turns  out  that  this  model  can  construct  the 
same  kinds  of  structural  descriptions  as  those  of  a 
transformational  grammar  and  can  do  it  in  a much  more 
economical  way.  The  merging  of  common  parts  of  alternative 
structures,  which  the  network  grammar  provides,  permits  a 
very  compact  representation  of  quite  large  grammars,  and 
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this  model  has  served  as  the  basis  for  several  natural 
language  understanding  systems  such  as  the  LUNAR  system 
(Woods,  Kaplan,  and  Nash-webber,  1972,  Woods,  1973b).  For 
speech  understanding,  the  transition  network  grammar  is  one 
of  the  few  linguistically  adequate  grammars  for  natural 
English  that  are  at  all  amenable  to  coping  with  the 
combinatorial  problems.  This  model  is  being  used  as  the 
basis  of  the  syntactic  component  of  the  BBN  speech 
understanding  project  (Bates,  1974,  Woods,  1974).  Other 
types  of  context  free  grammars  can  be  augmented  by 
conditions  and  actions  associated  with  the  grammar  rules  in 
a similar  way,  but  such  grammars  lose  the  benefits  of  the 
transition  networks  (such  as  merging  common  parts  of 
different  rules)  which  we  discussed  previously.  Another 
advantage  of  the  transition  network  formalism  is  the  ease 
with  which  one  can  follow  the  arcs  backwards  and  forwards  in 
order  to  predict  the  types  of  constituents  or  words  which 
could  occur  to  the  right  or  left  of  a given  word  or  phrase. 
One  of  the  important  roles  of  a syntactic  component  in 
speech  understanding  is  to  predict  those  places  where  s m t 1 1 
function  words  such  as  "a",  "an",  "of"  should  occur  since 
such  words  are  almost  always  unstressed  and  difficult  to 
unambiguously  find  in  the  input.  In  the  BBN  speech  system 
such  words  are  almost  always  found  as  a result  of  syntactic 
prediction  and  are  not  even  looked  for  during  lexical 
analysis  since  spurious  matches  would  be  found  more  often 
than  correct  ones. 

The  ATN  formalism  suggests  a way  of  viewing  a grammar 
as  a map  with  various  landmarks  and  recognizable  locations 
that  one  encounters  in  the  course  of  crossing  a sentence 
from  left  to  right.  for  speech  understanding  this 
perspective  is  beneficial,  for  example,  in  attempting  to 
correlate  various  prosodic  characteristics  of  sentences  with 
such  "geographical  landmarks"  within  the  structure  of  a 
sentence. 

Let  me  conclude  this  presentation  of  syntactic 
techniques  with  a reiteration  that  1 have  not  attempted  to 
make  a case  that  any  one  parsing  technique  or  grammar 
formalism  is  uniformly  better  tnan  others  (indeed  I do  not 
believe  there  is  a best  one  for  all  applications).  Rather, 
I have  attempted  to  give  sufficient  insight  into  the 
relative  advantages  and  disadvantages  to  enable  the  reader 
to  make  appropriate  choices  for  particular  applications. 
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Bart  II,  Semantics 

Turning  now  to  the  subject  of  semantics,  I should 
perhaps  first  make  the  point  that  the  word  "semantics''  meano 
different  things  to  different  people.  There  is  a tradition 
in  philosophy  and  logic  that  specifies  the  semantics  of 
formal  systems  such  as  the  propositional  calculus  in  terms 
of  a set  of  "truth  conditions"  for  each  possible  expression 
in  the  system.  These  truth  conditions  are  abstract  entities 
whicn  specify  the  situations  or  "possible  worlds"  in  which 
tne  statement  would  be  true.  In  linguistics,  on  the  other 
hand,  concern  is  usually  devoted  to  finding  a notation  or 
representation  in  which  to  specify  ea  of  the  different 
possible  interpretations  or  "readings"  which  a natural 
language  sentence  can  have  and  to  procedures  for  determining 
wnether  a sentence  is  meaningful  or  "anomalous"  (i.e.  not 
meaningful).  Tne  linguist  does  not  usually  follow  this  up 
by  providing  a semantics  ir.  terms  of  truth  conditions  for 
his  notation.  In  the  field  of  programming  languages  in 
computer  science,  the  semantics  of  a programming  language  is 
specified  in  terms  of  the  computations  which  the  machine  is 
to  perform  as  a fpsult  of  a given  expression.  In  specifying 
a formal  semantics  for  such  systems  however,  one  usually 
takes  recourse  to  defining  tne  semantics  by  reducing  it  to 
another  notation  such  as  tnose  of  elementary  arithmetic, 
wnose  semantics  is  presumably  understood.  In  the  fields  of 
computational  linguistics  and  artificial  intelligence,  the 
term  is  perhaps  most  misused.  In  some  cases,  it  i3  taker  to 
cover  everything  that  isn't  syntax  --  i.e.  everything  that 
is  not  part  of  a grammar,  while  in  otners  it  is  asserted  to 
be  no  different  in  principle  from  syntax,  and  any  basis  for 
a distinction  between  the  two  is  denied. 

wnile  I don't  have  tne  space  here  to  go  into  a complete 
exposition  of  tne  different  concerns  of  all  of  these 
different  perspectives  on  semantics,  I will  try  to  give  a 
brief  synopsis  of  the  distinctions. 

Let  us  begin  by  considering  what  all  of  these  different 
things  which  call  themselves  semantics  have  in  common. 
According  to  rr  dictionary,  semantics  is  "the  scientific 
study  of  the  relations  between  signs  or  symbols  and  what 
tney  denote  or  mean."  This  is  the  traditional  use  of  the 
term  and  represents  tne  common  thread  which  links  the 
different  concerns  discussed  above.  Notice  that  the  term 
does  not  refer  to  the  things  dent'd  or  the  meanings,  but  to 
the  relations  between  these  t .ngs  and  the  linguistic 
expressions  which  denotp  tnem.  Thus,  although  it  may  be 
difficult  to  isolate  exactly  what  part  of  a system  is 
semantics,  any  sjstem  which  understands  sentences  and 
carries  out  appropriate  actions  in  response  to  them  is 
somehow  completing  this  connection,  and  therefore  is 

applying  semantic  knowledge  to  this  task.  One  of  the  common 


BBN  Report  No.  3067 


Bolt  Beranek  and  Newman  Inc  . 


misuses  of  the  term  semantics  in  the  fields  of  comput"  . ional 
linguistics  and  artificial  intelligence  is  to  extei.d  the 
coverage  of  the  term  not  only  to  this  relation  between 
linguistic  form  and  meaning,  but  to  all  of  the  retrieval  and 
inference  capabilities  of  the  system.  This  misuse  arises 
since  for  many  tasks  in  language  processing,  the  use  of 
semantic  information  to  make  an  evaluation  necessarily 
involves  not  only  the  determination  of  the  object  denoted, 
but  also  some  inference  about  that  object.  In  absence  of  a 
good  name  for  this  further  inference  process,  terms  such  as 
"semantic  inferences”  have  come  to  be  used  foi  the  entire 
process.  I regret  to  say  that  I have  no  really  good 
substitute  term  for  such  processes  and,  since  the 
terminology  is  so  well  established  in  some  of  the 
literature,  I will  use  the  term  "semantic  inferences"  in 
this  paper  in  referring  to  inferences  that  cross  the 
boundary  between  symbol  and  referent  and  then  draw 
conclusions  about  that  referent.  (One  must  be  iware  however 
that  not  all  writers  who  use  this  term  mean  the  same  thing 
by  it . ) 

The  concerns  of  the  linguists  and  the  philosonhers  in 
the  areas  of  semantics  are  effectively  two  halves  of  the 
same  process,  both  of  which  the  fields  of  computational 
linguistics  and  speech  understanding  will  have  to  cope  with. 
In  reducing  the  semantics  of  natural  language  sentences  to 
some  formal  notation,  the  linguist  has  only  c mpleted  half 
of  the  job  if  he  does  not  go  or.  and  specify  a semantics  of 
the  resulting  formal  system.  It  is  at  this  point  that  the 
concerns  of  philosophers  and  logicians  in  specifying  the 
semantics  for  formal  systems  takes  over.  Notice  that 
specifications  of  the  formal  semantics  of  programming 
languages  in  terms  of  the  notations  of  elementary  arithmetic 
are  satisfactory  only  to  the  extent  that  we  understand  fully 
what  these  notations  themselves  mean.  This  is  also  the  case 
for  specifying  the  semantics  of  natural  language. 

I hope  the  above  pr  esentation  has  alerted  you  to  some 
of  the  different  kinds  of  things  to  which  the  term  semantics 
can  refer,  and  I will  attempt  to  make  clear  which  one  I am 
using  in  the  remainder  of  this  presentation.  I should  point 
out  that  in  the  field  of  computational  linguistics  we  don't 
have  nearly  as  good  an  understanding  of  semantics  as  we  do 
of  syntax.  I cannot  give  you  the  same  kind  of  evolution  of 
ideas  through  successively  more  powerful  models  and 
techniques,  all  of  which  are  well  understood.  Here  instead, 
the  mechanisms  wnich  we  understand  thoroughly  are  known  to 
be  inadequate  for  dealing  with  many  aspects  of  the  problem, 
and  the  techniques  which  hold  promise  of  dealing  with  some 
of  the  more  difficult  problems  are  not  yet  sufficiently 
understood  or  tested,  for  anyone  to  say  whether  they  in  fact 
solve  the  problem  or  not.  In  this  area,  then,  we  have  many 
promising  approaches,  but  few  definite  answers. 
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what  I will  attempt  to  do  here  is  provide  an 
understanding  of  some  basic  principles  of  semantic 
representation  and  interpretation  that  will  apply  to  any 
system  that  understands  natural  language  (whetner  text  or 
speech),  and  then  some  specific  techriques  which  1 think 
have  direct  relevance  to  speech  understanding.  In 
particular,  1 will  describe  two  techniques  which  are  being 
applied  in  the  BBN  speecn  understanding  system.  One  is  the 
technique  of  semantic  interpretation  into  procedural 
semantics  which  I have  applied  effectively  to  several 
natural  language  question-answering  applications,  and  the 
other  is  tne  technique  of  "semantic  intersections"  in 
semantic  network  representations  of  knowledge  which  was 
developed  by  Quillian  (1968,  1969).  For  more  details  on  the 
specific  applications  of  tne  latter  technique  to  speech 
understanding,  see  Nash-webber  (1974  and  1975*).  For  the 
most  part,  the  details  of  many  otner  interesting  things  that 
are  being  done  in  the  area  of  computational  semantics  for 
natural  language  will  have  to  be  left  to  the  references. 
Articles  which  may  be  of  interest  include:  Bruce  (1973), 
Carbonell  and  Collins  (1974),  Collins  and  Quillian  (1969), 
Collins  and  warnock  ( 1 Q 7 4 ) . Fillmore  (1068),  Green  and 
Raphael  (1968),  Heidorn  (1972),  Norman  and  humelhart  (1973), 
Sancewall  (1971),  winograd  (1972),  woods  (1967),  and 
articles  by  Newell,  Simmons,  wilks,  Winograd,  Schank,  Colby, 
Abelson,  hunt,  Lindsay,  and  becker  in  Schank  and  Colby 
( 1 973  ) . 


Procedural  Semantics 

It  appears  that  the  programming  language  theorists 
stand  on  firmer  ground  tnan  the  philosophers  or  the 
linguists  in  specifying  tne  semantics  of  their  systems, 
since  they  can  define  the  semantics  of  their  notations  in 
terms  of  tne  procedures  that  the  machine  is  to  carry  out. 
Notice  tnat  the  notion  of  procedure  shares  with  the  notion 
of  meaning  that  elusive  quality  of  being  impossible  to 
present  except  by  means  of  alternative  representations.  The 
procedure  itself  is  sometning  abstract  which  is  instantiated 
whenever  someone  carries  out  the  procedure,  but  otnerwise , 
all  one  has  when  it  is  not  being  executed  is  some 
representation  of  it. 

Although  in  ordinary  natural  language  not  every 
sentence  is  overtly  dealing  with  procedures  to  be  executed, 
it  is  possible  nevertheless  to  use  the  notion  of  procedures 
as  a means  of  specifying  the  truth  conditions  of  declarative 
statements  as  well  as  tne  intended  meaning  of  questions  and 
commands.  One  thus  picks  up  the  semartic  chain  from  the 
pnilosopners  at  tne  level  of  truth  conditions  and  completes 
it  to  the  level  of  formal  specifications  of  procedures. 
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These  can  in  turn  be  characterized  by  their  operations  on 
real  machines  and  can  be  thereby  anchored  to  physics.  This 
notion  of  characterizing  the  truth  conditions  of  sentences 
in  terms  of  mechanical  procedures  is  one  that  I called 
"procedural  semantics"  in  my  1968  AFIPS  paper  (Woods,  1968) 
and  the  term  has  since  gained  wide  circulation.  The 
application  of  this  technique  in  computer  systems  for 
natural  language  understanding  has  been  very  effective.  Two 
notable  computer  systems  which  make  use  of  this  type  of 
semantics  are  the  LUNAR  system  (woods,  Kaplan,  & 
Hash-Webber,  1972,  woods,  1973b)  and  the  blocks  world  system 
of  Winograd  (1972).  The  former  understands  and  answers 
questions  such  ai  "what  is  the  average  concentration  of 
aluminum  in  high  alkali  rocks?",  while  the  latter 
understands  and  carries  out  instructions  such  as  "Put  the 
pyramid  on  the  block  in  the  corner,"  (including  resolving 
the  ambiguity  by  determining  whether  there  is  a pyramid  on  a 
block  or  a block  in  the  corner).  Since  the  semantic 
techniques  used  in  the  LUNAR  system  are  more  formalized  and 
rule  driven  and  since  I am  more  familiar  with  the  details  of 
that  system,  I will  use  LUNAR  as  the  principal  illustration 
of  the  technique.  I think  the  rules  used  there  can 
effectively  serve  as  a formal  model  for  what  is  going  on  in 
a number  of  other  language  understanding  systems. 


Semantics  in  LUNAR 

The  semantic  framework  of  the  LUNAR  system  consists  of 
three  parts  --  a semantic  notation  in  which  to  represent  the 
meanings  of  the  sentences,  a specification  of  tne  semantics 
or  meanings  of  tnis  notation  by  means  of  LISP  programs,  r id 
a procedure  for  assigning  representations  in  the  notation  to 
input  sentences.  In  LUNAR,  the  semantic  notation  (which  I 
have  referred  to  there  as  a query  language)  consistj  of  an 
extended  notational  variant  of  the  predicate  calculus. 

The  query  language  contains  essentially  three  kinds  of 
constructions  : 

1)  designators,  which  name  or  denote  objects  c r classes 
of  objects  in  the  data  base, 

2)  propositions,  which  correspond  to  statements  that 
can  be  either  true  or  false  in  the  data  base,  and 

3)  commands,  which  initiate  and  carry  out  actions. 

Designators  come  in  two  varieties  --  individual  specifiers 
and  class  specifiers.  Individual  specifiers  correspond  to 
proper  nouns  and  variables.  For  example,  SI  004 6 is  a 
designator  for  a particular  sample,  ULIV  is  a designator  for 
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a certain  mineral  (olivine),  and  X3  can  be  a variable 
denoting  any  type  of  object  in  the  data  base.  Class 
specifiers  are  designators  used  to  denote  classes  of 
individuals  over  which  quantification  can  range.  They 
consist  of  the  name  of  an  enumeration  function  for  the  class 
plus  arguments,  "or  example,  ( SEQ  TYPECS)  is  a 
specification  of  tne  class  of  type  C rocks  (i.e.  breccias) 
and  (DATALINE  S10046  OVERALL  0L1V)  is  a specification  of  the 
set  of  lines  of  a table  of  chemical  analyses  which 
correspond  to  analyses  of  sample  S10046  for  the  overall 
concentration  of  olivine. 

Elementary  propositions  are  formed  from  predicates  with 
designators  as  arguments,  and  complex  propositions  are 
formed  from  these  by  use  cf  tne  logical  connectives  AND,  OR, 
and  NOT  and  by  quantification.  For  example, 
(CONTAIN  S10046  0L1V ) is  a proposition  formed  by 
substituting  designators  as  arguments  to  the  predicate 
CONTAIN,  and  (AND  (CONTAIN  X3  OLIV)  (NOT  (CONTAIN  X3  PLAG))) 
is  a complex  proposition  corresponding  to  the  assertion  that 
X3  contains  olivine  but  does  not  contain  plagioclase. 
Elementary  commands  consist  of  the  name  of  a command 
function  plus  arguments,  ana  lixe  propositions,  complex 
commands  can  be  constructed  using  logical  connectives  and 
quantification,  TEST  is  a command  function  for  testing  the 
truth  value  of  a proposition  given  as  its  argument.  Thus, 
(TEST  (CONTAIN  S1G046  OLIV))  will  answer  yes  or  no  depending 
on  whether  sample  S10Q46  contains  olivine.  Similarly 
PRINTOUT  is  a command  function  wnich  prints  out  a 
representation  for  a designator  given  as  its  argument. 

The  format  for  a quantified  proposition  or  command  is: 

(FOR  QUANT  X / CLASS  : PX  ; QX  ) 

where  QUANT  is  a type  of  quantifier  (EACH,  EVERT,  SOME,  THE, 
numerical  quantifiers,  etc,),  X is  a variable  of 
quantification,  CLASS  is  a class  specifier  for  * he  class  of 
objects  over  wnicn  quantification  is  to  range,  .-X  specifies 
a restriction  on  the  range,  and  QX  is  the  proposition  or 
command  being  quantified.  (botn  PX  and  QX  may  themselves  be 
quantified  expressions.)  For  example  (FOR  &VEHY  XI  / (SEQ 
TYPECS)  : (CONTAIN  XI  PLAG)  ; (CONTAIN  XI  OLIV))  is  a 
quantified  proposition  corresponding  to  the  statement  that 
every  type  C rock  that  contains  plagioclase  also  contains 
olivine.  (FOR  EVERY  X2  / (DATALlNE  S10046  OVERALL  OLIV)  : T 
; (PRINTOUT  X 2 ) ) is  a quantified  command  to  printout  all  of 
the  chemical  analyses  of  S10046  for  overall  olivine 
concentrations.  (For  expository  reasons,  the  notation  has 
been  slightly  simplified  nere  compared  to  that  actually  used 
in  the  LUNAR  system,  but  the  differences  are  minor.) 


hi 


BbN  Report  No.  3067 


Bolt  faeranek  and  Newman  Inc. 


Semantics  of  the  Notation 

Having  specified  our  semantic  notation  for  representing 
the  meanings,  we  must  now  specify  the  meanings  of  our 
notations.  As  mentioned  before,  we  do  thi3  in  LUNAh  by 
relating  the  notations  to  procedures  which  can  be  executed. 
For  each  of  the  predicate  names  that  can  be  used  in 
specifying  semantic  representations,  we  will  specify  a 
procedure  or  subroutine  which  will  determine  the  truth  of 
the  Dredicate  for  given  values  for  the  arguments.  Similarly 
for  each  of  the  functions  which  can  be  used,  we  will  specify 
a procedure  which  can  compute  the  value  of  that  function 
given  the  values  of  its  arguments.  For  each  of  the  class 
specifiers  for  the  FOR  function,  we  will  require  a 
subroutine  which  enumerates  the  members  of  the  class.  The 
FOR  function  itself  is  also  defined  by  a subroutine  as  are 
the  logical  operators  AND,  OR  and  NOT  and  the  basic  command 
functions  TEST  and  PRINTOUT.  Thus  any  well  formed 
expression  in  the  query  language  is  a composition  of 
functions  which  have  procedural  definitions  in  the  retrieval 
component  and  are  therefore  themselves  well  defined 
procedures  capable  of  execution  on  the  data  base.  In  fact 
in  the  LUNAR  system,  the  definition  oi  aii  of  tnese 
procedures  is  done  in  LISP  and  the  notation  of  the  query 
language  is  so  cnosen  that  its  expressions  are  executable 
LISP  programs.  The  totality  of  these  function  definitions 
and  the  data  baue  on  which  they  operate  constitute  the 
retrieval  component  of  the  system. 

It  should  be  pointed  out  that  by  virtue  of  this 
definition  of  tne  primitive  functions  and  predicates  as  LISP 
functions,  the  query  language  can  be  viewed  simultaneously 
as  a higher-level  programming  language  and  as  an  extension 
of  the  predicate  calculus.  This  gives  rise  to  two  different 
possible  types  of  inference  for  answering  questions, 
corresponding  to  the  philosopher's  distinction  between 
intension  and  extension.  First,  because  of  its  definition 
by  means  of  procedures,  a question  such  as  "Does  every 
sample  contain  silicon?"  can  be  answered  extensional lv  (that 
is  by  appeal  to  the  individuals  denoted  by  the  class  name 
"samples")  by  enumerating  the  individual  samples  and 
cnecking  whether  sodium  has  been  found  in  each  one.  On  the 
other  hand,  this  same  question  could  have  been  answered 
intentionally  (that  is  by  reference  to  its  meanings  alone 
without  reference  to  the  objects  denoted)  by  means  of  the 
application  of  inference  rules  to  other  (intentional)  facts 
such  as  the  assertion  "tvery  sample  contains  some  amount  of 
each  element.”  Thus  the  expressions  in  tne  query  language 
are  capable  either  of  direct  execution  against  the  data  base 
(extensional  mode)  or  manipulation  by  mechanical  inference 
algorithms  or  theorem  provers  (intentional  mode).  Only  the 
former  (extensional)  mode  of  inference  is  actually  used  in 
the  LUNAh  system.  This  gives  rise  to  some  limitations  (e.g. 
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it  is  not  possible  to  prove  most  assertions  about  infinite 
sets  in  extensional  mode),  but  is  very  efficient  for  a 
variety  of  question-answering  applications. 


Semantic  Interpretation 

Having  now  specified  the  notation  in  wnicn  we  will 
represent  the  meanings  of  English  sentences  in  our  system 
and  making  sure  that  we  understand  the  nature  of  the 
meanings  of  the  expressions  in  that  notation,  we  are  now 
left  with  the  specification  of  the  process  whereby  meanings 
are  assigned  to  sentences.  This  process  is  referred  to  as 
semantic  interpretation,  and  in  LUNAR  it  is  driven  by  a set 
of  formal  semantic  interpretation  rules.  The  semantic 
interpreter  operates  on  a syntactic  structure  or  fragment  of 
one  which  has  been  constructed  by  the  parser,  assigning 
semantic  expressions  in  the  notation  to  the  nodes  of  this 
structure  to  indicate  the  "meanings"  of  tnose  constructions 
to  tne  system.  In  LUNAR  this  procedure  is  such  that  the 
interpretation  of  nodes  can  be  initiated  in  any  order,  but 
if  the  interpretation  of  a node  requires  the  interpretation 
of  a constituent  node,  then  the  interpretation  of  that 
constituent  node  is  performed  before  the  interpretation  of 
the  higher  noae  is  completed.  Thus,  it  is  possible  to 
perform  the  entire  semantic  interpretation  by  calling  for 
the  interpretation  of  the  top  node  (the  sentence  as  a 
whole),  and  this  is  the  normal  mode  in  whicn  the  interpreter 
is  operated  in  the  LUNAR  system. 


Semantic  Ri’les 

In  determining  the  meaning  of  a construction,  two  types 
of  information  are  used  --  syntactic  information  about 
sentence  construction  and  semantic  information  about 
constituents.  For  example,  in  interpreting  the  meaning  of 
the  sentence,  "S10046  contains  silicon,"  it  is  both  the 
syntactic  structure  of  the  sentence  (subject  = S10046;  verb 
= "contain";  object  = silicon)  plus  the  semantic  facts  that 
S 1 004  6 is  a sample  and  silicon  is  a chemical  element  that 
determine  the  interpretation  (CONTAIN  S10046  SILICON). 
(Note  that  the  predicate  CON'IAIN  here  is  the  name  of  a 
procedure  in  the  retrieval  component  and  it  is  only  by  the 
"accident"  of  mnemonic  design  that  its  name  happens  to  be 
the  same  as  the  English  word  "contain"  in  the  sentence  that 
we  have  interpreted.) 

In  LUNAR,  this  information  about  the  semantic 
interpretations  of  syntactic  structures  is  embodied  in 
semantic  rules  consisting  of  patteri  s that  determine  whether 
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a rule  can  apply  and  actions  tnat  specify  how  the  semantic 
interpretation  is  to  be  constructed.  An  example  of  such  a 
rule  is  given  in  Figure  16. 


(S=  SAMPLE-CONTAIN 

(S.NP  (MEM  1 (SAMPLE)) ) 

(S.V  (OR  (EQU  1 HAVE) 

( EQU  ) CONTAIN) ) ) 

(S.OBJ  (MEM  1 (ELEMENT  OXIDE  ISOTOPE)) ) 
(PRED  (CONTAIN (#1  1)(#3  I)))) 


Figure  16:  A Sample  Semantic  Interpretation  Rule 


The  name  of  tne  rule  is  b : SAnPLo-CONTAlN  , and  the  left-hand 
side,  or  pattern  part  of  tne  rule,  consists  of  tnree 
templates  wnich  match  fragments  of  syntactic  structure.  The 
first  template  requires  that  the  sentence  being  interpreted 
have  a subject  noun  phrase  which  is  a member  of  the  semantic 
class  SAi-lPLt,  the  second  requires  that  tne  verb  be  either 
"have”  or  "contain",  and  the  third  requires  a direct  object 
which  is  cither  a chemical  element,  an  oxiae  or  an  isotope. 
Tne  terms  S.NP,  S.V  ana  S.OBJ  name  scnemata  for  tree 
fragments  which  are  used  not  only  to  test  for  the  presence 
of  their  corresponding  syntactic  structures  in  the  sentence, 
but  also  to  associate  reference  numbers  with  selected  nodes 
in  the  structure.  These  numbers  are  usea  for  reference  by 
the  semantic  conditions  in  tne  templates  and  for  use  in  the 
right-hand  side  of  the  semantic  rule.  For  example,  the  tree 
fragment  S.NP  locates  the  subject  noun  phrase  of  the 
sentence  and  associates  tne  reference  number  1 with  that 
noun  phrase . 

The  right-hand  side,  or  action  part,  of  the  rule 
follows  the  right  arrow  and  specifies  that  the 
interpretation  of  tnis  node  is  to  be  a predicate  formed  by 
inserting  the  interpretations  of  two  constituent  nodes  into 
the  schema  (CONTAIN  (#  1 1)(#  3 1)).  where  the  expressions 
(It  m n)  refer  to  tne  interpretation  of  tne  node  with 
reference  number  n for  template  number  m in  the  match  of  the 
left-hand  side  of  tne  rule, 


UU 


bfaN  Heport  No.  3067 


Bolt  Beranek  and  Newman  Inc. 


Organization  of  Rules 

The  semantic  rules  for  interpreting  sentences  are 
usually  governed  by  the  verb  of  the  sentence.  That  is,  out 
of  the  entire  set  of  semantic  rules,  only  a relatively  small 
number  of  them  can  possibly  apply  to  a given  sentence 
because  of  the  verb  mentioned  in  the  rule.  Similarly  the 
rules  which  interpret  noun  phrases  are  governed  by  the  head 
noun  of  the  noun  pnra'e.  For  this  reason,  thr  semantic 
rules  in  LUNAR  are  indexed  according  to  the  heads  of  the 
constructions  to  which  they  could  apply  ar>  . recorded  in  the 
dictionary  entry  for  tne  head  words.  bach  rule  then 
characterizes  a s y n t ac t ic/ semant ic  environment  in  which  a 
word  can  occur  and  specifies  its  interpretation  in  that 
environment.  The  templates  of  a verb  rule  thus  describe  the 
necessary  and  sufficient  constituents  and  semantic 
restrictions  in  order  for  the  verb  to  be  meaningful.  Nouns 
in  noun  phrases  benave  similarly.  That  is,  the  semantic 
rules  not  only  specify  the  process  of  interpretation  which 
assigns  semantic  representations,  but  their  left-nand  sides 
also  specify  the  conditions  under  which  given  words  and 
constructions  are  meaningful. 


Semantic  ales  in  General 

The  above  presentation  is  oversimplified  in  a number  of 
respects  for  the  sake  of  expository  brevity.  There  is  in 
general  a greater  variety  of  devices  that  are  used  in  the 
semantic  rules  of  tne  LUNAR  system,  and  tnere  are  numerous 
details  of  operation  that  we  will  not  consider  here.  (Such 
as  the  desired  oehavior  when  a template  or  a rule  matches  in 
several  different  ways.)  ror  more  details  on  tnese  issues, 
the  reader  is  referred  to  woods  (1967)  and  to  Woods,  Kaplan, 
and  Nash-webber  (1972).  There  are  also  many  other 
interesting  issues  in  the  semantics  of  natural  language 
which  have  not  been  explored  in  the  LUNAR  system  or  any 
other  computer  system  whicn  are  currently  more  the  domain  of 
philosophers  than  computer  scientists  but  which  will 
eventually  have  to  be  handled  by  computer  systems  if  they 
are  to  be  facile  at  understanding  human  language.  The 
diversity  of  these  issues,  however,  is  beyond  the  scope  of 
this  presentation. 

In  many  question  answering  systems  semantic 
interpretation  rules  are  pairea  more  directly  with  the 
syntactic  rules  of  tne  grammar  so  tnat  there  is  little  or  no 
template  matching  required  (and  consequently  less  latitude 
for  producing  semantic  interpretations  that  are  not  in 
node-for-node  correspondence  witn  the  syntactic  structure). 
In  still  other  systems,  the  semantics  are  not  formalized  in 
rules,  but  are  simply  embodied  in  arbitrary  computer 
programs  (and  consequently  totally  unconstrained  in  what 
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could  be  done  theoretically  but  providing  little  or  no 
theory  or  conceptual  framework  for  what  is  going  on.) 
However,  the  kind  of  semantic  rules  tnat  are  used  in  LUNAR 
can  be  used  as  formal  models  to  explain  what  is  going  01  in 
the  semantics  of  these  other  systems  in  which  the  semantics 
is  either  more  restricted  or  less  formalized. 


Semantic  Judgments 

As  in  tne  case  of  syntax,  semantics  has  both  a 
judgmental  ana  a structural  aspect.  That  is,  semantic 
information  is  used  botn  to  construct  semantic 
representations  of  the  meanings  ot  tne  sentences  and  to 
reject  anomalous  or  semantically  ill-formed  sentences.  tfhat 
we  have  described  so  far  has  mostly  dealt  with  the 
structural  aspect  --  how  to  assign  a semantic  representation 
to  a sentence  and  what  representation  to  assign.  This 
capability  is  necessary  for  any  language  understanding 
system  whether  it  is  text  or  speecn.  In  tne  judgmental 
component,  however,  there  are  a number  of  things  which 
semantics  can  do  which  are  particularly  important  for  speech 
understanding.  As  we  pointed  out  above,  the  pattern  parts 
of  the  semantic  interpretation  rules  can  be  used  to  specify 
what  assemblages  of  syntactic  structures  and  lexical  words 
are  meaningful. 

In  tne  next  few  sections,  wnat  I would  like  to  do  is 
briefly  survey  the  uses  of  semantic  information  which  have 
been  made  in  various  question  answering  syst'ems  using  the 
notion  of  semantic  interpretation  rules  as  presented  above 
to  unify  the  discussion.  I shall  no  longer  be  directly 
concerned  with  the  use  of  the  rules  for  the  assignment  of 
semantic  interpretations  to  sentences,  but  with  the 
ancillary  use  of  the  information  eraboaied  in  these  rules  for 
other  purposes  . 

Semantic  information  is  used  in  a number  of  text 
oriented  language  understanding  systems  to  select 
semantically  meaningful  parsings  from  among  all  of  the 
possible  parsings  of  a sentence,  for  example,  in  tne 
context  of  airline  flight  scheaules  in  interpreting  a 
sentence  3uch  as  "Does  American  have  a fight  from  some  east 
coast  city  to  Chicago"  we  can  tell  tnat  the  phrase  "to 
Chicago"  modifies  flight  ana  not  city  because  we  have 
semantic  interpretation  rules  for  flights  to  places  while  we 
do  not  nave  any  rules  to  interpret  cities  to  places.  In 
speech  understanding,  this  ability  to  determine  whether  a 
given  interpretation  of  a sentence  is  semantically 
meaningful  is  critical  not  only  for  choosing  between 
alternative  parsings,  but  also  for  choosing  between 
alternative  segmentations  of  the  input  signal  into  words. 
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In  the  next  few  sections  I will  discuss  some  of  the 
techniques  that  have  oeen  used  in  various  question  answering 
systems  to  use  semantic  information  for  this  judgmental  role 
and  discuss  their  advantages  and  limitations  for  speech 
understanding  applications. 


Semantic  Selectional  Kestrictions 

As  we  mentioned  above,  the  attempt  to  characterize  the 
difference  between  semantically  well-formed  sentences  and 
those  wnicn  are  semantically  anomalous  has  been  a major 
concern  of  many  linguistic  semanticists  (see  e.g.  Katz  & 
foaor,  1964).  The  device  which  is  used  in  most  such 
attempts  is  a notion  of  semantic  selectional 
restrictions  --  restrictions  between  the  verbs  of  a sentence 
and  semantic  features  of  the  arguments  which  they  can 
sensibly  take.  For  example,  the  restriction  that  verbs  like 
"intend"  require  higher  animate  subjects  is  used  to  explain 
the  oddness  of  sentences  sucn  as  "the  rock  intends  to  sit 
there."  This  account  assumes  tnat  the  nouns  of  the  language 
can  oe  assignee  to  semantic  classes  such  as  "higher  animate" 
and  that  there  must  be  "semantic  agreement"  or  at  least  no 
semantic  disagreement  oetween  the  verb  of  a sentence  ana  the 
subjects,  objects  and  other  arguments  which  it  can  take.  It 
is  in  this  area  of  semantics  that  the  misconceptions  about 
the  distinction  between  syntax  and  semantics  arise,  since 
there  is  usually  no  difference  in  principle  between  tne 
implementation  of  such  semantic  restrictions  to  reject 
semantically  anomalous  sentences  and  implementation  of 
syntactic  restrictions  such  as  number  agreement  to  reject 
syntactically  incorrect  sentences.  For  sufficiently 
restricted  and  fixed  domains  of  discourse,  it  is  possible  to 
implement  such  semantic  selectional  restrictions  by 
subcategorizing  the  syntactic  categories  of  the  grammar  with 
classes  like  'animate  noun'  and  'color  adjective'  rather 
than  simply  noun  or  adjective.  One  thereby  incorporates  the 
testing  of  semantic  selectional  restrictions  into  the 
grammar  ana  avoias  the  need  for  any  special  mechanism  for 
testing  semantic  selectional  restrictions. 

Ihe  technique  of  semantically  subcategorizing  the 
syntactic  categories  of  the  grammar  has  been  applied 
effectively  in  limited  speech  understanding  applications. 
It  has  tne  advantage  of  being  efficient  in  execution  and 
easy  to  implement  for  sufficiently  simple  understanding 
tasks.  However,  one  should  understand  its  limitations.  One 
of  the  major  inadequacies  is  that  the  use  of  semantic 
selectional  restrictions  as  prerequisites  for  grammaticality 
or  semantic  well-formedness  is  not  quite  correct.  Father 
most  such  conditions  are  required  only  for  a sentence  to  be 
true.  when  the  sentence  is  a question  or  when  it  asserts  a 
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negative  possibility,  then  semantic  selectional  restrictions 
may  be  violated  by  perfectly  reasonable  sentences.  A speech 
understanding  system  which  contains  sucn  restrictions 
embedded  in  its  grammar  will  fail  to  parse  such  inputs. 
(For  example,  in  Terry  Winograd 's  blocks  world  program  the 
sentence  "Can  a table  like  blocks?"  fails  to  parse  since  the 
system  applies  the  selectional  restriction  that  "like" 
requires  an  animate  subject.)  A speech  understanding  system 
which  used  such  selectional  restrictions  as  a prerequisite 
for  acceptability  of  an  interpretation  of  a speech  signal 
would  be  unable  to  "hear"  this  sequence  of  words  no  matter 
how  well  articulated  and  how  successful  the  acoustic  and 
phonological  analysis,  but  would  rather  insist  on  looking 
for  some  other  interpretation  of  the  signal. 

An  additional  limitation  of  the  semantic  selectional 
restriction  approach  is  that  the  necessary  semantic 
information  associated  with  a given  argument  to  a verb  is 
not  necessarily  associated  with  the  lexical  items  in  the 
noun  pnrase,  but  may  be  associated  with  the  referent  of  the 
noun  pnrase  instead.  The  association  of  such  information 
with  the  dictionary  entries  for  the  words  is  really  just  an 
approximation  (albeit  a useful  one  for  many  applications)  of 
what  one  really  wants  the  semantic  selectional  restrictions 
to  test , 

A major  practical  difficulty  with  incorporating  the 
semantic  selectional  restrictions  into  the  syntactic 
categories  of  the  grammar  is  the  lack  of  ex t end  a bi 1 i t y thus 
induced.  If  one  wants  to  apply  the  system  to  a different 
domain  of  discourse  or  to  extend  the  domain  slightly,  he  has 
to  redefine  the  categories  of  the  grammar. 


Semantic  Screening 

A somewnat  more  versatile  technique  for  using  semantic 
information  to  select  an  appropriate  parsing  is  to  apply 
semantic  rules  to  the  nodes  of  tne  syntactic  tree  structure 
as  tne  nodes  are  ouilt  by  tne  parser.  If  tne  node  just 
constructed  fails  to  have  a semantic  interpretation,  then 
tnat  computation  path  of  the  parser  is  rejected  and  the 
parser  looks  for  other  ways  to  parse  the  input.  This 
technique  of  semantic  screening  applies  the  semantic 
selectional  restrictions  as  a filter  or  a sieve  during  the 
parsing  operation.  In  its  simplest  form,  the  semantic  rules 
are  associated  with  the  rules  of  a context-free  grammar  in  a 
one-to-one  fashion  so  that  as  soon  as  a syntactic  rule  is 
applied,  the  corresponding  semantic  rule  is  tested, 
semantic  screening  is  often  touted  as  a mechanism  for 
gaining  increased  efficiency  in  parsing  since  it  tends  to 
cause  early  rejection  of  parsing  paths  whicn  otherwise  would 
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have  been  conli.  .ued  further.  This  argument,  however, 
neglects  to  count  the  cost  of  the  semantic  interpretation  on 
uncompleted  parsings  which  would  not  have  been  completed  in 
any  case  for  syntactic  reasons.  Whether  semantic  screening 
really  provides  an  increase  in  efficiency  depends  on  the 
relative  costs  of  the  extra  or  unnecessary  semantic 
processing  and  the  syntactic  processing  that  is  thereby 
eliminated.  In  many  situations,  it  is  more  efficient  to 
complete  the  syntactic  analysis  and  then  apply  the  semantic 
testing  . 

Another  technique  which  is  related  to  semantic 
screening  is  to  apply  tests  not  only  of  general  semantic 
well-formedness  but  also  tests  of  factuality  in  cGnjunctign 
with  tne  formation  of  a constituent.  This  is  the  case  for 
example  in  winograd's  system  when  he  makes  his  decision 
about  "put  the  pyramid  on  the  block  in  the  corner”  on  the 
basis  of  wnether  there  is  a pyramid  on  a block  in  the 
current  state  of  the  world  and  not  just  on  the  basis  oi 
general  information  about  whether  pyramids  can  be  on  blocks. 
This  technique  can  be  very  useful  in  some  situations,  but 
its  exclusive  and  uncontrolled  use  would  uake  it  impossible 
to  say  things  that  were  not  already  true  or  to  ask  about 
things  that  were  not  true. 


Semantic  Selection 

A major  inadequacy  of  semantic  (and  of  factual) 
screening  and  indeed  of  any  application  of  semantic 
selectional  restrictions  as  strict  prerequisites  for 
well-formedness  is  its  inability  to  deal  with  sentences  such 
an  ”1  saw  the  man  in  tne  park  with  a telescope"  in  which 
„here  are  many  possible  parsings  which  are  all  semantically 
possible,  but  are  not  equally  plausible.  Although  it  is 
possible  that  I was  in  a park  which  contained  a telescope 
when  I saw  the  man  somewnere  else,  this  is  not  the  most 
likely  interpretation  in  absence  of  specific  information 
that  would  indicate  this  interpretation.  Rathe1"  there  is  a 
kind  of  default  interpretation  that  the  telescope  was 
probably  used  to  see  the  man,  and  in  absence  of  reason  to 
believe  otherwise  the  man  was  probably  in  the  park.  What  is 
required  in  general  rather  than  a mere  rejection  of 

semantically  ill-formed  interpretations  is  a mechanism  to 
select  the  mo3t  plausible  interpretation  from  among  a set  of 
syntactically  related  alternatives.  Although  the  solution 
of  this  problem  in  general  is  not  at  hand,  a beginning  has 
been  made  in  a mechanism  called  selective  modifier  placement 
in  the  LUNAR  parser  (see  Woods,  1973a)*  This  mechanism  uses 
information  such  as  the  fact  that  a telescope  is  an  optical 
instrument  and  one  can  see  with  an  optical  instrument  to 

prefer  the  alternative  of  "with  a telescope"  modifying 
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"see",  while  in  absence  of  semantic  preference,  the  modifier 
"in  the  park"  modifies  the  syntactically  preceding  noun 
phrase  "man".  The  technique  has  not  been  systematically 
developed,  however,  and  except  for  the  placement  of 
prepcs^ional  phrase  modifiers,  the  use  of  semantic 
judgments  in  LUNAR  to  select  among  alternative  parsings  is 
not  well  developed. 


Semantic  Prediction 

All  oi  t e preceding  techniques  for  making  semantic 
judgments  about  completed  syntactic  constructions  are  of 
greet  importance  for  speech  understanding.  There  are, 
however,  situations  in  the  course  of  understanding  a speech 
utterance  where  one  does  not  have  a complete  construction  to 
work  with  and  would  like  to  make  use  of  semantic  information 
to  guide  the  speech  understander  to  look  for  words  which 
mignt  have  been  slightly  garbled  or  to  provide  initial 
preferences  among  the  words  that  are  discovered  on  the  basis 
of  acoustic  and  lexical  analyses  alone.  Given  for  example 
that  we  have  found  tne  words  "sample"  and  "contain"  in  a 
speech  signal,  we  would  like  to  make  use  of  our  semantic 
information  to  predict  tnat  there  should  now  occur  a word 
which  is  a chemical  element,  an  oxide  or  an  isotope.  This 
information  is  contained  in  our  semantic  rules  (specifically 
it  is  in  tne  left-hand  sides  of  the  rule-').  Similarly  upon 
encountering  the  words  "sample"  and  "contain"  among  a large 
number  of  other  words  in  the  initial  word  lattice,  we  would 
tike  to  use  the  semantic  information  to  notice  that  these 
two  words  are  related  and  perhaps  go  together  in  the 
interpretation  of  the  utterance.  botn  of  tnese  semantic 
roles  make  use,  not  of  the  logical  or  interpretative  sense 
of  semantics,  but  of  a kind  of  associational  semantics  which 
studies  the  semantic  relationships  among  words  and  concepts. 
There  are  a number  ot  psychologists  and  psycholinguists  as 
well  as  people  in  artificial  intelligence,  sociology  and 
other  field  who  have  been  trying  ti  model  this  aspect  of 
semantics  with  various  kinds  of  network  structures.  The 
initial  impetus  in  this  area  was  created  by  Ross  Quillian 
( 1 9 6 tJ , 1969  ),  but  other  researchers  in  this  area  of 
semantics  includ.  Abelson,  Carbonell,  Collins,  Rumelhart  and 
Norman,  Schank,  Simmons,  and  others  (a  sampling  of  most  of 
these  authors  is  given  in  Schank  & Colby,  1973  and  others 
are  cited  explicitly  in  this  paper.)  The  work  of  Fillmore 
(1968)  has  also  been  influential  in  this  area  of  study,  and 
recently,  similar  notions  have  been  used  at  HIT  as  the  basis 
for  programs  that  analyze  visual  scenes  (winston,  1970).  I 
will  describe  here  some  of  the  characteristics  of  semantic 
networks  as  Quillian  visualized  them  which  have  direct 
application  in  speech  understanding  and  which  have  been 
included  in  the  BBN  speech  understanding  system. 
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Quillian  was  not  interested  in  the  notions  of  semantics 
as  characterizing  truth.  Indeed  he  denied  (I  tnink 
erroneously)  the  psychological  relevance  of  such  notions. 
Kather  he  viewed  the  "meaning"  of  a word  as  merely  a 
collect'on  of  the  concepts  that  are  associated  with  it 
(without,  however,  giving  any  adequate  explication  of  what 
was  meant  by  a concept).  I consiuer  Quillian's  original 
formulation  and  much  of  the  work  that  it  has  stimulated  to 
be  inadequate  in  the  respect  that  it  doesn't  give  any 
attention  to  a specification  of  the  semantics  of  the  network 
notation  itself,  but  that  doesn't  lessen  the  validity  of 
many  of  the  points  that  he  and  others  of  this  school  have 
raised  . 

Quillian  wa3  concerned  with  investigating  the  structure 
in  which  humans  tore  information  in  their  brains.  Thus, 
the  so  called  semantic  networks  are  really  attempts  at 
finding  structures  and  organizations  for  storing  knowledge. 
His  concern  is  not  with  having  a notation  in  which  to  write 
down  a list  of  facts,  but  rather  with  an  overall  memory 
structure  in  which  the  interrelationships  among  those  facts, 
which  humans  use  for  retrieval  of  information  and  for 
construction  of  inferences,  are  explicitly  and  efficiently 
represented.  The  important  thing  for  Quillian  is  not  so 
much  the  structure  of  a particular  concept,  but  the  network 
of  relations  to  other  concepts  that  are  established.  In 
particular,  Quillian  sought  to  devise  a mechanism  and  a 
structure  which  > ould  account  for  the  types  of  semantic 
associations  which  people  make  ana  tne  way  these 
associations  manifest  themselves  in  human  language 
understanding . 

To  give  a flavor  ot  the  kind  or  network  that  Quill:  ■>n 
had  in  mind,  figure  17  (taken  from  Quillian,  (1969)),  give., 
an  example  of  the  concept  associated  witn  the  lexical  item 
"client".  Each  lexical  item  or  word  points  to  one  or  more 
"concepts"  or  nodes  in  the  semantic  net  (corresponding  to 
different  senses  of  the  word)  each  of  wnich  is  merely  an 
assemblage  of  pointers  to  other  concept  node.  in  the 
network.  In  Figure  17  the  identifiers  PERSON,  EMPLOY,  and 
PROFESSIONAL  stand  for  pointers  to  other  concept  nodes  in 
the  network.  In  Quillian's  view,  the  meaning  of  a concept 
is  the  sum  total  of  the  collection  of  concept  nodes  to  which 
it  is  connected  --  no  more  and  no  less.  while  this  gives 
very  little  leverage  on  solving  any  of  the  problems  of 
semantic  interpretation  or  cnaracterization  of  truth 
conditions,  it  is  a superb  mechanism  for  accomplishing  the 
semantic  predictions  and  noticing  the  coincidences  among 
semantically  related  words  that  are  required  for  speech 
understanding.  In  particular,  Quillian's  notion  of  semantic 
ir oer3ect ion  can  play  an  important  role  in  speech 
understanding. 
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Figure  17:  A Fragment  ol'  a Quillian  Network 
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Semantic  Intersection 

Quillian  developed  the  notion  of  semantic  intersection 
as  an  attempt  to  account  for  tne  human  capability  to 
immediately  identify  tne  relationsnips  between  diverse 
things  such  as  between  'plant'  and  'alive'  or  (more  subtly) 
between  Madrid  and  Mexico,  and  to  account  for  the  tendency 
of  people  to  accept  an  ambiguous  term  in  a particular  sense 
induced  by  tne  appropriateness  to  the  context  without 
noticing  the  other  possible  senses  (a  phenomenon  called 
"foregrounding").  In  foregrounding,  the  appropriate  sense 
is  somehow  brought  forward  and  made  more  accessible  than  the 
other  senses  due  to  tne  influence  of  the  context.  Tht. 
mechanism  which  Quillian  proposed  to  account  for  such 
phenomena  and  which  he  believed  was  the  principal  process 
for  accessing  information  from  one's  knowledge  store  was  a 
process  which  he  called  semantic  intersection.  Quillian 
assumed  tnat  in  the  brain,  whenever  a concept  was  brought 
into  consideration  in  a discourse  or  wnatever,  it  was 
somehow  stimulated  or  "activated"  and  tnat  this  activation 
passed  out  in  waves  from  the  source  of  tne  stimulation  to 
the  concept  noaes  to  whicn  it  was  connected.  When  the 
activation  waves  from  two  different  sources  met  at  some  node 
in  the  memory,  a semantic  intersection  was  detected,  and  a 
path  through  tne  semantic  memory  was  thereby  estaolished 
which  represented  tne  semantic  relationship  between  the  two 
source  concepts.  (e.g.  madrid  is  in  Spain  which  is  like 
Mexico  in  language  and  culture.)  Similarly,  such  activations 
have  some  duration  in  time,  and  when  an  ambiguous  word  is 
encountered,  tne  sense  that  people  are  likely  to  take  is  the 
sense  which  has  semantic  connections  with  concepts  that  are 
currently  activated  (as  detected  by  tne  presence  of  semantic 
intersect  ions ) . 

In  speech  understanding,  this  foregrounding  effect  of 
semantic  intersections  can  be  used  to  influence  the  words 
that  one  hears  in  an  otnerwise  ambiguous  segment  of  speech, 
and  can  be  used  to  detect  the  coincidences  of  semantically 
related  words  in  a word  lattice.  following  connections 
through  the  semantic  network  can  also  be  used  to  predict 
words  that  have  not  been  detected  in  the  signal  but  which 
are  sufficiently  likely  that  tney  should  be  looked  for.  For 
details  on  the  use  of  such  techniques  in  the  understanding 
oh  continuous  speech,  the  reader  is  again  referred  to 
Nasn-webber  (1974,  1975*).  Notice  that  the  information  that 
we  have  in  the  pattern  parts  of  tne  semantic  interpretation 
rules  ot  LUNAH  is  one  type  of  information  that  we  would  like 
to  have  in  such  a semantic  network.  Notice  also  that 
wnereas  in  LUnAR  tne  information  about  associated  semantic 
classes  is  available  conveniently  if  one  starts  with  the 
head  of  a construction,  similar  information  in  a semantic 
network  format  would  be  equally  accessible  from  any  of  the 
concepts  involved  in  the  rule.  This  is  one  more  instance  of 
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the  importance  of  breaking  a priori  orderings  of  processing 
in  speech  understanding  in  favor  of  multiple,  redundant  ways 
of  achieving  the  same  result.  In  any  given  utterance,  it 
could  be  one  of  the  critical  head  words  that  is  garoled,  and 
one  would  like  to  be  able  nevertheless  to  find  the  semantic 
relationships  among  the  arguments  and  use  tnu.n  to  predict 
the  missing  head  . 


Other  Aspects  of  Semantic  Nets  and 
Knowledge  Representation 

Another  notion  embedded  in  Ouillian's  conception  of  a 
semantic  network  (which  also  has  rudimentary  beginnings  in 
Raphael's  SIR  system  (Raphael,  1964))  is  that  information 
about  a concept  can  be  stored  at  several  different  levels  up 
a chain  of  more  ana  more  inclusive  concepts  (Quillian  called 
them  superconcepts)  . For  example,  a canary  is  a bird  which 
is  a type  of  animal  whicn  is  in  tu-’n  a physical  object.  It 
may  thus  have  certain  properties  wnicn  are  store a directly 
at  the  level  of  canary  (such  as  being  yellow)  out  other 
properties  that  are  common  to  a great  many  concepts  ana 
wnich  are  stored  at  the  most  general  level  of  applicability. 
These  would  be  automatically  innerited  oy  subconcepts  (in 
absence  of  contrary  information)  without  having  to  be  stored 
over  and  over  again  lor  each  of  the  entities  for  whicn  they 
are  true  . 

There  is  a tremendous  amount  ol  interest  right  now  in 
various  semantic  network  representations,  what  such 
structures  snould  look  like,  now  they  should  be  used  to  ao 
inferences,  what  kinas  of  things  should  be  put  into  a 
network  in  response  to  understanding  a sentence,  etc.  In 
particular,  it  is  pointed  out,  notably  by  SchanK  and  his 
students,  that  a great  deal  of  wnat  is  understood  in 
response  to  an  input  sentence  comes  from  gratuitous 
assumptions  that  are  maae  on  the  basis  of  knowledge  already 
in  memory  and  not  specifically  transraittea  by  tne  sentence. 
For  example,  Schank  cites  diaiog  pairs  such  as  ’’would  you 
liKe  an  ice  cream  cone?"  ana  "I  just  ate”,  in  which  the 
second  utterance  should  oe  interpreted  as  giving  a negative 
answer  to  the  question.  I think  it  snoula  be  apparent  that 
when  one  attempts  to  understand  spoken  discourses  and  make 
judgments  about  the  contextual  appropriateness  of  a given 
interpretation  of  an  utterance,  the  ability  to  make  such 
semantic  inferences  using  large  amounts  of  semantic  ana 
factual  knowledge  (as  well  as  pragmatic  knowledge  about  what 
the  speaker  is  likely  to  say  in  a given  situation)  will  be 
of  paramount  importance.  The  inaoility  to  account  for  a 
given  interpretation  of  an  utterance  by  being  able  to  relate 
it  to  wnat  ha3  been  said  belore  or  to  some  aspect  of  the 
current  context  snoula  raise  the  possibility  that  the 


bbN  Report  No.  3067 


bolt  beranek  and  Newman  Inc. 


• * 


1* 


mm 


Am 


i ’i 


■ v 


utterance  has  been  misheard.  The  ability  to  fully  use  this 
level  of  sophisticated  inference  as  part  of  a speech 
understanding  system,  however,  will  probably  have  to  await 
further  developments  in  the  ongoing  studies  in  knowledge 
representation  and  mechanical  inference.  The  techniques 
which  exist  today  in  these  areas  are  either  extremely 
limited  or  inordinately  cumbersome. 
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CONCLUSION 

I have  attempted  here  to  provide  a perspective  on  some 
of  the  work  that  has  been  done  in  the  areas  of  syntax  and 
semantics  for  understanding  natural  language  by  machines  and 
to  call  special  attention  to  those  techniques  which  have 
particular  relevance  to  the  problems  of  speech 
understanding. 

I have  tried  to  cover  a range  of  different  parsing 
algorithms  and  grammar  models  with  emphasis  on  the 
advantages  and  disadvantages  of  the  various  features  of 
these  models  for  the  particular  types  of  problems  that  one 
will  encounter  in  analyzing  continuous  speech,  and  I have 
tried  to  give  ray  opinions  as  to  tne  value  of  these  different 
features.  In  particular,  I have  argued  that  tne  use  of  a 
predetermined  order  of  finding  things  (such  as  left  to  right 
across  the  sentence)  is  potentially  dangerous  and  perhaps  to 
be  avoided.  1 have  pointed  out  tnat  the  ambiguity  of 
syntactic  word  class,  which  is  one  of  the  major  sources  of 
ambiguity  in  English  text)  is  greatly  magnified  in  speech 
understanding  by  the  inability  to  uniquely  determine  what 
the  wore  at  a given  position  is.  wnereas  in  text  parsing, 
one  at  least  knows  what  tne  word  is  and  therefore  has  an 
expectation  of  two  or  three  possible  syntactic  categories 
for  it,  in  speech  understanding  we  may  have  a naif  dozen 
alternative  possible  words  at  a given  point,  each  with  one 
or  more  possible  syntactic  categories.  Hence  the 
combinatorial  problems  that  arise  from  the  multiplication  of 
possible  alternative  analyses  is  rauen  worse  for  speech. 
Tnis  is  complicated  by  tne  fact  that  most  of  the  techniques 
that  nave  been  developed  in  text  parsers  for  minimizing  the 
impact  of  tnese  combinatorial  possibilities  require 
carefully  designed  sequences  of  looking  for  things  which 
conflict  with  tne  above  observation  that  such  constrained 
orderings  are  sensitive  to  the  errors  in  tne  lexical 
analysis  ol  the  input  that  are  virtually  inevitable  in 
speech. 

The  use  of  word  lattices  as  input  instead  of  sequences 
and  the  desigi.  of  parsing  algorithms  around  well-formed 
substring  tables  or  charts  appear  to  be  viable  metnoas  for 
dealing  with  the  combinatorial  problem  of  speech 
understanding.  The  merging  of  common  parts  of  different 
analyses  permitted  by  transition  network  grammars  is  also 
helpful  in  this  respect.  In  order  to  be  able  to  correct 
errors,  it  will  be  essential  to  be  able  to  come  at  a given 
parsing  from  several  directions.  Consequently  checks  will 
be  necessary  at  appropriate  points  to  avoid  duplicating  an 
analysis  that  has  already  been  found. 

Another  important  role  of  syntax  in  a speech 
understanding  system  is  tne  prediction  of  those  places  where 
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small  function  words  might  occur  in  order  to  compensate  for 
tne  unreliability  of  their  identification  by  lexical 
analysis . 

Altnough  our  understanding  of  semantics  is  not  as  well 
advanced  as  that  of  syntax  (which  itself  is  far  from 
complete),  tnere  are  a number  of  semantic  techniques  that 
language  understanding  programs  have  used  which  can  have 
great  benefit  in  tne  construction  of  speecn  understanding 
machines.  Ihese  include  tne  use  of  procedural  semantics  for 
the  specification  of  tne  operations  wnich  are  to  be  carried 
out  in  response  to  the  understanding  ol  the  sentence,  the 
use  of  semantic  selectional  restrictions  to  rule  out 
unlikely  interpretations  of  the  speech  signal,  and  the  use 
of  semantic  associations  as  embodied  in  the  Quillian 
semantic  intersection  tecnnique  to  notice  coincidences 
between  semantically  related  words  at  different  points  in 
tne  input.  one  should  be  aware,  however,  of  tne  limitations 
of  some  of  tnese  tecnniques  ana  the  need  for  continuea 
research  in  tne  areas  botn  of  syntax  and  semantics  in  order 
to  increase  the  range  of  things  wnich  such  systems  can 
understana  ana  their  abilities  to  cnoose  correctly  between 
alternative  interpretations  of  a signal. 

1 thinK  it  is  clear  tnat  in  order  to  cover  the  scope  of 
material  that  I have  it  has  been  necessary  to  treat  many 
issues  rather  shallowly  and  others  not  at  ail.  hopefully 
the  references  will  provide  additional  detail  for  the 
interested  reader  to  follow  up.  I hope  that  1 nave  given 
you  some  feeling  for  tne  issues  and  some  of  the  things  that 
are  going  on  in  computational  linguistics,  linguistics, 
psychology,  and  artificial  intelligence  relative  to  syntax 
and  semantics  ana  some  of  tne  ramifications  of  the  speech 
understanding  task  for  these  areas.  Given  the  different 
perspective  tnat  ti.e  speecn  understanding  task  places  on  the 
roles  of  syntax  and  semantics  in  language  understanding,  I 
believe  tnat  the  speech  understanaing  problem  can  have 
almost  as  great  an  impact  on  researcn  in  syntax  and 
semantics  as  tnese  areas  are  now  having  on  tne  problem  of 
automatic  speecn  recognition. 
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